1.1 | 新闻关联标签表 | (news_tag_v1) | ||||||||
FULL_NAME_EN | News Related Tag | |||||||||
描述 : | 记录新闻关联的所有类型属性标签信息,根据新闻来源、新闻篇幅、新闻其他特点(是否是政策类、是否是数据类、是否包含图表)等对新闻自动打标签。标签包括是否是基本面新闻、是否是数据型新闻、是否是政策新闻等,也包含新闻的申万行业分类,新闻类别,是否财经类以及是否重复等信息。 | |||||||||
DESCRIPTION_EN : | Record all types of attribute tag information of News Association, and label news automatically according to news source, news length, and other characteristics (whether it is policy type, whether it is data type, and whether it contains charts). The tags include whether it is basic news, whether it is data-based news, whether it is policy news, etc., as well as the Shenwan industry classification, news category, whether it is financial and economic, and whether it is repetitive. | |||||||||
唯一键 : | NEWS_ID | |||||||||
数据起始时间 : | 2016-01-01 | |||||||||
更新频率 : | 实时 | |||||||||
数据更新时间 : | 不定时 | |||||||||
数据调用方式 : | API,HERMES | |||||||||
API文档 : | getNewsTagInd--新闻关联标签行业表 | |||||||||
数据来源 : | 通联数据 | |||||||||
DATA_SOURCE_EN : | Datayes | |||||||||
字段信息 : | ||||||||||
序号 | 字段名 | 中文名称 | FULL_NAME_EN | 数据类型 | 可空 | 字段描述 | DESCRIPTION_EN | 参数值 | 备注 | |
1 | ID | 自增ID | ID | bigint | 否 | |||||
2 | NEWS_ID | 新闻ID | News Id | bigint | 否 | |||||
3 | EFFECTIVE_TIME | 新闻有效发布时间 | Effective Time | datetime | 是 | 新闻有效发布时间,若新闻发布时间和当前时间不是同一天, 且发布时间早于当前时间超过12小时,采用NEWS_PUBLISH_TIME;其他情况下采用新闻首次爬取时间。 | If the news release time is not the same day as the current time, and the release time is more than 12 hours from the current time, PUBLISH_Time of news be used; in other cases, the first crawling time of news is used. | |||
4 | MAIN_GROUP_ID | 新闻在新闻全集中是否重复 | Is It Duplicated | int | 是 | -1表示已重复新闻,非-1表示不重复 | -1 means repeated news, non-1 means no repetition | |||
5 | CLUSTER_ID | 关联类id | Cluster Id | bigint | 是 | 代表新闻的聚类id,重复的新闻有一样的cluster_id值 | Repeated news has the same cluster_ ID | |||
6 | NEWS_GENRE | 新闻类别 | News Genre | varchar(50) | 是 | 普通新闻、价格动态、公告新闻三类 | There are three types of news: general news, price dynamic and announcement news. Ordinary news: the news that is not price dynamic or announcement news; price dynamic: the real-time dynamic news that is the transaction price of raw materials and precious metals. The length of news is usually very short, and the main content is price and time; announcement news: the news content comes from the announcement, and the content is exactly the same as the announcement, or some chapters of the announcement are selected. | |||
7 | IS_PRO_SITE | 网站类别标签 | Is It Professional Website | tinyint | 是 | 0为非专业网站,1为专业网站,2为微信 | 0 is non professional website, 1 is professional website, 2 is wechat website | |||
8 | IS_ECONOM | 是否包含基本面信息 | Is It Fundamental Information | tinyint | 是 | 1-是,0-否 | 1-yes, 0-no | |||
9 | IS_DATA | 是否包含数据信息 | Is It Data News | tinyint | 是 | |||||
10 | IS_LONG_NEWS | 是否是长新闻 | Is It Long News | tinyint | 是 | |||||
11 | IS_SHORT_NEWS | 是否是短新闻 | Is It Short News | tinyint | 是 | |||||
12 | IS_MONTH_DATA | 是否是月度数据 | Is It Monthly Data | tinyint | 是 | |||||
13 | IS_POLICY | 是否是国家发布的政策 | Is It Policy | tinyint | 是 | |||||
14 | IS_PICTURE | 是否是图片式新闻 | Is It A Picture | tinyint | 是 | |||||
15 | IS_PERIOD | 是否是定期报告 | Is It Period Report | tinyint | 是 | |||||
16 | IS_WECHARTSTOCK | 是否是公司微信公众号 | Is It Wechat Account | tinyint | 是 | |||||
17 | IS_RUMOUR | 是否是传闻 | Is It A Rumor | tinyint | 是 | |||||
18 | IS_RUMOUR_RESPONSE | 是否是传闻回应 | Is It A Rumor Response | tinyint | 是 | |||||
19 | MINISTRY | 部委新闻的发布来源 | Sources of News Release of Ministries | varchar(50) | 是 | 包含央行、发改委等,none代表非部委发布的新闻 | Including the news released by the central bank, the national development and Reform Commission, and none means not from ministries and commissions | |||
20 | INDUSTRY_NAME_1ST | 一级分类 | First Level Industry | varchar(30) | 是 | 行业新闻(industry)、公司新闻(stock)、 宏观新闻(marco)、债券新闻(bond)、市场新闻(market)、其他新闻(other)、非财经(none) |
Including Industry, stock,Marco, bond, market, other | |||
21 | INDUSTRY_NAME_2ND | 二级分类 | Second Level Industry | varchar(30) | 是 | 如果一级类别是“行业新闻”,则细分到具体行业, 包括申万一级27个行业(不包含“综合”) |
If the first category is "industry", it will be subdivided into specific industries,Including 27 industries of Shewan level 1 (excluding "comprehensive") | |||
22 | INDUSTRY_NAME_3RD | 三级分类 | Third Level Industry | varchar(30) | 是 | 如果一级类别是“行业新闻”,则细分到具体申万二级行业类别,该字段供参考不建议直接使用 | If the first category is "industry", it will be subdivided into industries of Shewan level 2 . | |||
23 | UPDATE_TIME | 更新时间 | Update Time | datetime | 是 | |||||
1.2 | 新闻热度信息表 | (news_popularity) | ||||||||
FULL_NAME_EN | News Popularity | |||||||||
描述 : | 记录所有类型新闻的热度信息,包括新闻ID、新闻标题、新闻热度、全局新闻热度等。 | |||||||||
DESCRIPTION_EN : | Record news popularity in website which measures heat information, including news ID, news title, news heat, global news heat, etc. | |||||||||
唯一键 : | NEWS_ID | |||||||||
数据起始时间 : | 2016-01-01 | |||||||||
更新频率 : | 实时 | |||||||||
数据更新时间 : | 不定时 | |||||||||
数据调用方式 : | API,HERMES | |||||||||
API文档 : | getNewsPopularity--新闻热度信息表 | |||||||||
数据来源 : | 通联数据 | |||||||||
DATA_SOURCE_EN : | Datayes | |||||||||
字段信息 : | ||||||||||
序号 | 字段名 | 中文名称 | FULL_NAME_EN | 数据类型 | 可空 | 字段描述 | DESCRIPTION_EN | 参数值 | 备注 | |
1 | ID | 自增ID | ID | bigint | 否 | |||||
2 | NEWS_ID | 新闻ID | News ID | bigint | 是 | Can be associated with original news database table. | 新闻ID(NEWS_ID)可与普通新闻主表(vnews_content_v1)的新闻ID(NEWS_ID)字段关联。 | |||
3 | NEWS_URL | 新闻网址 | News Source Url | nvarchar(2000) | 否 | |||||
4 | NEWS_TITLE | 新闻标题 | News Title | nvarchar(1200) | 是 | |||||
5 | POPULARITY | 新闻热度 | News Popularity | decimal(24,20) | 是 | 根据去重后的新闻阅读数、参与数、回复数和转载数计算得出,每2分钟刷新一次。 | Calculated based on the number of news reading, participation, replies and reprints, refreshed every 2 minutes. | |||
6 | GLOBAL_POPULARITY | 全局新闻热度 | News Clobal Popularity | decimal(24,20) | 是 | 在新闻热度的基础上根据发布时间做时间衰减后的结果 | Based on news popularity, the value decayed by the release time. | |||
7 | EFFECTIVE_TIME | 新闻有效时间 | News Effective Time | datetime | 是 | If the news release time is not the same day as the current time, and the release time is more than 12 hours from the current time, PUBLISH_Time of news beused;in other cases, the first crawling time of news is used. | ||||
8 | UPDATE_TIME | 更新时间 | Update Time | datetime | 否 | |||||
1.3 | 政策分类表 | (news_policy_classification_v1) | ||||||||
FULL_NAME_EN | News Policy Classification | |||||||||
描述 : | 记录政策类新闻分类,包含申万一级行业、宏观、非政策类共33个类别。 | |||||||||
DESCRIPTION_EN : | Record the classification of policy news, including 33 categories of Shenyi industry, macro and non policy. | |||||||||
唯一键 : | NEWS_ID | |||||||||
数据起始时间 : | 2016-01-01 | |||||||||
更新频率 : | 实时 | |||||||||
数据更新时间 : | 不定时 | |||||||||
数据调用方式 : | API,HERMES | |||||||||
API文档 : | getNewsPlcyClassifV1--政策分类表 | |||||||||
数据来源 : | 通联数据 | |||||||||
DATA_SOURCE_EN : | Datayes | |||||||||
字段信息 : | ||||||||||
序号 | 字段名 | 中文名称 | FULL_NAME_EN | 数据类型 | 可空 | 字段描述 | DESCRIPTION_EN | 参数值 | 备注 | |
1 | ID | 自增ID | ID | bigint | 否 | 信息编码 | information encoding | |||
2 | NEWS_ID | 新闻ID | News ID | bigint | 否 | 新闻信息编码 | News information code | |||
3 | NEWS_INDUSTRY_NAME | 行业分类名 | News Industry Name | varchar(30) | 否 | 申万一级行业、宏观、非政策共计33个类别。 | There are 33 categories of industries, macro policies and non policies at the emergency level. | |||
4 | NEWS_SITE_NAME | 抓取网站名 | News Site Name | varchar(50) | 是 | 新闻发布来源,即新闻的实际爬取来源 ,主要包含政府官网以及部委等网站。 |
News release source, i.e. the actual crawling source of news, mainly including government official websites and websites of ministries and commissions. | |||
5 | SOURCE_NAME | 新闻原始出处 | Source Name | varchar(50) | 是 | 新闻初始来源,即新闻原始出处 | The original source of news | |||
6 | NEWS_EFFECTIVE_TIME | 新闻发布时间 | News Effective Time | datetime | 是 | 抓取网站公布时间,取自news_metadata 的news_publish_time | Grab the website publishing time from news_ Metadata news_ publish_ time | |||
7 | UPDATE_TIME | 更新时间 | Update Time | datetime | 是 | 数据更新时间 | Update time of data | |||
1.4 | 新闻关键词表 | (news_keywords) | ||||||||
FULL_NAME_EN | News Keywords | |||||||||
描述 : | 记录所有类型新闻关键词标签信息,包含去重后的NER标签和关键词两类别。NER标签包括人名实体标签、产品名实体标签、公司名实体标签等。 | |||||||||
DESCRIPTION_EN : | Record keywords tag in news , including ner tag and keywords after de duplication. Ner label includes person name entity label, product name entity label, company name entity label, etc. | |||||||||
唯一键 : | NEWS_ID | |||||||||
数据起始时间 : | 2016-01-01 | |||||||||
更新频率 : | 实时 | |||||||||
数据更新时间 : | 不定时 | |||||||||
数据调用方式 : | API,HERMES | |||||||||
API文档 : | getNewsKeywords--获取新闻关键词 | |||||||||
数据来源 : | 通联数据 | |||||||||
DATA_SOURCE_EN : | Datayes | |||||||||
字段信息 : | ||||||||||
序号 | 字段名 | 中文名称 | FULL_NAME_EN | 数据类型 | 可空 | 字段描述 | DESCRIPTION_EN | 参数值 | 备注 | |
1 | ID | 自增ID | ID | bigint | 否 | |||||
2 | NEWS_ID | 新闻ID | News ID | bigint | 否 | Can be associated with original news database table. | 新闻ID(NEWS_ID)可与普通新闻主表(vnews_content_v1)的新闻ID(NEWS_ID)字段关联。 | |||
3 | NEWS_KEYWORDS | 新闻关键词 | News Keywords | varchar(500) | 是 | |||||
4 | NEWS_KEYPHRASE | 新闻关键词组 | News Keyphrase | varchar(500) | 是 | |||||
5 | NEWS_PERSON_ENTITY | 新闻人名实体标签 | News Person Entity | varchar(500) | 是 | |||||
6 | NEWS_PRODUCT_ENTITY | 新闻产品名实体标签 | News Product Entity | varchar(500) | 是 | |||||
7 | NEWS_COMPANY_ENTITY | 新闻公司名实体标签 | News Company Entity | varchar(500) | 是 | |||||
8 | NEWS_FUND_ENTITY | 新闻基金名实体标签 | News Fund Entity Label | text | JSONArray格式。fundAbbrName-基金简称,fundID-基金ID,fundName-基金名称,parentAbbrName-主基金简称,parentName-主基金全称,secID-通联编制的证券唯一编码,tickerSymbol-交易代码。 | Jsonarray format | ||||
9 | UPDATE_TIME | 更新时间 | Update Time | datetime | 是 | |||||
1.5 | 新闻证券关联表 | (news_security_score) | ||||||||
FULL_NAME_EN | News product association table | |||||||||
描述 : | 记录新闻和期货、基金、债券、股票产品的关联关系,包含关联等级、关联等分、股票交易代码和产品内部代码等(该表供展示类的场景使用,股票量化类客户请使用news_company_score表)。 | |||||||||
DESCRIPTION_EN : | Record the association relationship between news and futures, funds, bonds and stock products, including Association grade, Association grade, stock trading code and product internal code. | |||||||||
唯一键 : | NEWS_ID,SECURITY_INT_ID | |||||||||
数据起始时间 : | 2020-01-01 | |||||||||
更新频率 : | 实时 | |||||||||
数据更新时间 : | 不定时 | |||||||||
数据调用方式 : | API,HERMES | |||||||||
API文档 : | getNewsSecurityScore--新闻证券关联表 | |||||||||
数据来源 : | 通联数据 | |||||||||
DATA_SOURCE_EN : | Datayes | |||||||||
字段信息 : | ||||||||||
序号 | 字段名 | 中文名称 | FULL_NAME_EN | 数据类型 | 可空 | 字段描述 | DESCRIPTION_EN | 参数值 | 备注 | |
1 | ID | 自增ID | ID | bigint | 否 | |||||
2 | NEWS_ID | 新闻id | News ID | bigint | 否 | |||||
3 | SECURITY_INT_ID | 证券内部id | Security ID | bigint | 否 | 与证券主表SECURITY_ID关联 | 证券内部id(SECURITY_INT_ID)可与证券主表(md_security)的证券内部ID(SECURITY_ID)字段关联。 | |||
4 | SEC_SHORT_NAME | 证券简称 | Securities Short Name | varchar(50) | 是 | |||||
5 | ASSET_CLASS | 证券类型 | Asset Class | varchar(10) | 是 | E:股票,B:债券,F:基金,FU:期货 | E: Stocks, B: Bonds, F: Funds, FU: Futures | |||
6 | TICKER_SYMBOL | 交易代码 | Ticker Symbol | varchar(20) | 是 | |||||
7 | PARTY_ID | 机构内部id | Company ID | bigint | 是 | 证券对应的机构id | 机构内部id(PARTY_ID)可与机构主表(md_institution)的机构内部ID(PARTY_ID)字段关联。 | |||
8 | RELATED_DEGREE | 关联等级 | Related Degree | tinyint | 是 | 关联等级,0不关联,1弱关联,2强关联 | Association level, 0 is not associated, 1 is weakly associated, 2 is strongly associated | |||
9 | RELATED_SCORE | 关联得分 | Related Score | double | 是 | 关联程度得分,取值为0-1的连续值。得分越高关联等级越高。利用强关联和弱关联的概率做加权求和得出的。 | Relevance score, a continuous value from 0 to 1. The higher the score, the higher the association level. It is obtained by weighted summation of the probabilities of strong and weak associations. | |||
10 | DEGREE_PROP_1ST | 关联等级为1的置信度 | Confidence Interval for Related Degree 1st | double | 是 | 分类为关联等级1的概率。 | The probability of being classified as association level 1. | |||
11 | DEGREE_PROP_2ED | 关联等级为2的置信度 | Confidence Interval for Related Degree 2nd | double | 是 | 分类为关联等级2的概率。 | The probability of being classified as association level 2. | |||
12 | EFFECTIVE_TIME | 新闻有效发布时间 | Effective Time | datetime | 是 | |||||
13 | INSERT_TIME | 入库时间 | Insert Time | datetime | 是 | |||||
14 | UPDATE_TIME | 更新时间 | Update Time | datetime | 是 | |||||
1.6 | 新闻证券关联历史表 | (news_sec_score_his) | ||||||||
FULL_NAME_EN | Supplementary Table for News Product Association History | |||||||||
描述 : | 记录新闻和期货、基金、债券、股票产品的关联关系,包含关联等级、关联等分、股票交易代码和产品内部代码等(该表数据区间:20190101-20200331)。 | |||||||||
DESCRIPTION_EN : | Record the association relationship between news and futures, funds, bonds, and stock products, including association level, association score, stock trading code, and product internal code (data range: 20190101-20200331). | |||||||||
唯一键 : | NEWS_ID,SECURITY_INT_ID | |||||||||
数据起始时间 : | 2019-01-01 | |||||||||
更新频率 : | 不定期 | |||||||||
数据更新时间 : | 不定时 | |||||||||
数据调用方式 : | API,HERMES | |||||||||
API文档 : | getNewsSecScoreHis--新闻证券关联历史表 | |||||||||
数据来源 : | 通联数据 | |||||||||
DATA_SOURCE_EN : | Datayes | |||||||||
字段信息 : | ||||||||||
序号 | 字段名 | 中文名称 | FULL_NAME_EN | 数据类型 | 可空 | 字段描述 | DESCRIPTION_EN | 参数值 | 备注 | |
1 | ID | 自增ID | ID | bigint | 否 | |||||
2 | NEWS_ID | 新闻id | News ID | bigint | 否 | |||||
3 | SECURITY_INT_ID | 证券内部id | Security ID | bigint | 否 | 与证券主表SECURITY_ID关联 | Securities Master Table_ ID association | 证券内部id(SECURITY_INT_ID)可与证券主表(md_security)的证券内部ID(SECURITY_ID)字段关联。 | ||
4 | SEC_SHORT_NAME | 证券简称 | Security Short Name | varchar(50) | 是 | |||||
5 | ASSET_CLASS | 证券类型 | Asset Class | varchar(10) | 是 | E:股票,B:债券,F:基金,FU:期货 | E: Stocks, B: Bonds, F: Funds, FU: Futures | |||
6 | TICKER_SYMBOL | 交易代码 | Ticker Symbol | varchar(20) | 是 | |||||
7 | PARTY_ID | 机构内部id | Company ID | bigint | 是 | 证券对应的机构id | Institution ID corresponding to securities | 机构内部id(PARTY_ID)可与机构主表(md_institution)的机构内部ID(PARTY_ID)字段关联。 | ||
8 | RELATED_DEGREE | 关联等级 | Related Degree | tinyint | 是 | 关联等级,0不关联,1弱关联,2强关联 | Association level, 0 is not associated, 1 is weakly associated, 2 is strongly associated | |||
9 | RELATED_SCORE | 关联得分 | Related Score | double | 是 | 关联程度得分,取值为0-1的连续值。得分越高关联等级越高。利用强关联和弱关联的概率做加权求和得出的。 | Relevance score, a continuous value from 0 to 1. The higher the score, the higher the association level. It is obtained by weighted summation of the probabilities of strong and weak associations. | |||
10 | DEGREE_PROP_1ST | 关联等级为1的置信度 | Confidence Interval for Related Degree 1st | double | 是 | 分类为关联等级1的概率。 | The probability of being classified as association level 1. | |||
11 | DEGREE_PROP_2ED | 关联等级为2的置信度 | Confidence Interval for Related Degree 2nd | double | 是 | 分类为关联等级2的概率。 | The probability of being classified as association level 2. | |||
12 | EFFECTIVE_TIME | 新闻有效发布时间 | News Effective Time | datetime | 是 | |||||
13 | INSERT_TIME | 入库时间 | Insert Time | datetime | 是 | |||||
14 | UPDATE_TIME | 更新时间 | Update Time | datetime | 是 | |||||
1.7 | 新闻行业分类表 | (news_industry_v2) | ||||||||
FULL_NAME_EN | Classification Table of News Industry | |||||||||
描述 : | 记录所有类型新闻与2014版申万行业关联信息,包括申万一级、申万二级、申万三级行业。 | |||||||||
DESCRIPTION_EN : | Record all types of news and industry related information for 2014 Edition Shenwan Sector, including Shenwan Level 1, Shenwan Level 2, and Shenwan Level 3 industries. | |||||||||
唯一键 : | NEWS_ID,INDUSTRY_NAME_3RD | |||||||||
数据起始时间 : | 2016-01-01 | |||||||||
更新频率 : | 实时 | |||||||||
数据更新时间 : | 不定时 | |||||||||
数据调用方式 : | Hermes,API | |||||||||
API文档 : | getNewsIndustryV2--新闻行业分类表 | |||||||||
数据来源 : | 通联数据 | |||||||||
DATA_SOURCE_EN : | Datayes | |||||||||
字段信息 : | ||||||||||
序号 | 字段名 | 中文名称 | FULL_NAME_EN | 数据类型 | 可空 | 字段描述 | DESCRIPTION_EN | 参数值 | 备注 | |
1 | ID | 信息编码 | ID | bigint | 否 | |||||
2 | NEWS_ID | 新闻ID | News ID | bigint | 否 | |||||
3 | EFFECTIVE_TIME | 新闻有效发布时间 | Effective Time | datetime | 是 | 新闻有效发布时间,若新闻发布时间和当前时间不是同一天, 且发布时间早于当前时间超过12小时,采用NEWS_PUBLISH_TIME;其他情况下采用新闻首次爬取时间。 | If the news release time is not the same day as the current time, and the release time is more than 12 hours from the current time, PUBLISH_Time of news be used; in other cases, the first crawling time of news is used. | |||
4 | INDUSTRY_NAME_1ST | 行业一级分类 | First Level Industry | varchar(30) | 是 | 申万一级行业 | Shenwan First Industry | |||
5 | INDUSTRY_NAME_2ND | 行业二级分类 | Second Level Industry | varchar(30) | 是 | 申万二级行业 | Shenwan Secondary Industry | |||
6 | INDUSTRY_NAME_3RD | 行业三级分类 | Third Level Industry | varchar(30) | 是 | 申万三级行业 | Shenwan Third Industry | |||
7 | UPDATE_TIME | 更新时间 | Update Time | datetime | 是 |