1.1 新闻关联标签表 (news_tag_v1)
FULL_NAME_EN News Related Tag
描述 : 记录新闻关联的所有类型属性标签信息,根据新闻来源、新闻篇幅、新闻其他特点(是否是政策类、是否是数据类、是否包含图表)等对新闻自动打标签。标签包括是否是基本面新闻、是否是数据型新闻、是否是政策新闻等,也包含新闻的申万行业分类,新闻类别,是否财经类以及是否重复等信息。
DESCRIPTION_EN : Record all types of attribute tag information of News Association, and label news automatically according to news source, news length, and other characteristics (whether it is policy type, whether it is data type, and whether it contains charts). The tags include whether it is basic news, whether it is data-based news, whether it is policy news, etc., as well as the Shenwan industry classification, news category, whether it is financial and economic, and whether it is repetitive.
唯一键 : NEWS_ID
数据起始时间 : 2016-01-01
更新频率 : 实时
数据更新时间 : 不定时
数据调用方式 : API,HERMES
API文档 : getNewsTagInd--新闻关联标签行业表
数据来源 : 通联数据
DATA_SOURCE_EN : Datayes
字段信息 :
序号 字段名 中文名称 FULL_NAME_EN 数据类型 可空 字段描述 DESCRIPTION_EN 参数值 备注
1 ID 自增ID ID bigint
2 NEWS_ID 新闻ID News Id bigint
3 EFFECTIVE_TIME 新闻有效发布时间 Effective Time datetime 新闻有效发布时间,若新闻发布时间和当前时间不是同一天, 且发布时间早于当前时间超过12小时,采用NEWS_PUBLISH_TIME;其他情况下采用新闻首次爬取时间。 If the news release time is not the same day as the current time, and the release time is more than 12 hours from the current time, PUBLISH_Time of news be used; in other cases, the first crawling time of news is used.
4 MAIN_GROUP_ID 新闻在新闻全集中是否重复 Is It Duplicated int -1表示已重复新闻,非-1表示不重复 -1 means repeated news, non-1 means no repetition
5 CLUSTER_ID 关联类id Cluster Id bigint 代表新闻的聚类id,重复的新闻有一样的cluster_id值 Repeated news has the same cluster_ ID 
6 NEWS_GENRE 新闻类别 News Genre varchar(50) 普通新闻、价格动态、公告新闻三类 There are three types of news: general news, price dynamic and announcement news. Ordinary news: the news that is not price dynamic or announcement news; price dynamic: the real-time dynamic news that is the transaction price of raw materials and precious metals. The length of news is usually very short, and the main content is price and time; announcement news: the news content comes from the announcement, and the content is exactly the same as the announcement, or some chapters of the announcement are selected.
7 IS_PRO_SITE 网站类别标签 Is It Professional Website tinyint 0为非专业网站,1为专业网站,2为微信 0 is non professional website, 1 is professional website, 2 is wechat website
8 IS_ECONOM 是否包含基本面信息 Is It Fundamental Information tinyint 1-是,0-否 1-yes, 0-no
9 IS_DATA 是否包含数据信息 Is It Data News tinyint
10 IS_LONG_NEWS 是否是长新闻 Is It Long News tinyint
11 IS_SHORT_NEWS 是否是短新闻 Is It Short News tinyint
12 IS_MONTH_DATA 是否是月度数据 Is It Monthly Data tinyint
13 IS_POLICY 是否是国家发布的政策 Is It Policy tinyint
14 IS_PICTURE 是否是图片式新闻 Is It  A Picture tinyint
15 IS_PERIOD 是否是定期报告 Is It Period Report tinyint
16 IS_WECHARTSTOCK 是否是公司微信公众号 Is It Wechat Account tinyint
17 IS_RUMOUR 是否是传闻 Is It A Rumor tinyint
18 IS_RUMOUR_RESPONSE 是否是传闻回应 Is It A Rumor Response tinyint
19 MINISTRY 部委新闻的发布来源 Sources of News Release of Ministries varchar(50) 包含央行、发改委等,none代表非部委发布的新闻 Including the news released by the central bank, the national development and Reform Commission, and none means not from ministries and commissions
20 INDUSTRY_NAME_1ST 一级分类 First Level Industry varchar(30) 行业新闻(industry)、公司新闻(stock)、
宏观新闻(marco)、债券新闻(bond)、市场新闻(market)、其他新闻(other)、非财经(none)
Including Industry, stock,Marco, bond, market, other
21 INDUSTRY_NAME_2ND 二级分类 Second Level Industry varchar(30) 如果一级类别是“行业新闻”,则细分到具体行业,
包括申万一级27个行业(不包含“综合”)
If the first category is "industry", it will be subdivided into specific industries,Including 27 industries of Shewan level 1 (excluding "comprehensive")
22 INDUSTRY_NAME_3RD 三级分类 Third Level Industry varchar(30) 如果一级类别是“行业新闻”,则细分到具体申万二级行业类别,该字段供参考不建议直接使用 If the first category is "industry", it will be subdivided into  industries of Shewan level 2 .
23 UPDATE_TIME 更新时间 Update Time datetime
1.2 新闻热度信息表 (news_popularity)
FULL_NAME_EN News Popularity
描述 : 记录所有类型新闻的热度信息,包括新闻ID、新闻标题、新闻热度、全局新闻热度等。
DESCRIPTION_EN : Record news popularity in website which measures heat information, including news ID, news title, news heat, global news heat, etc.
唯一键 : NEWS_ID
数据起始时间 : 2016-01-01
更新频率 : 实时
数据更新时间 : 不定时
数据调用方式 : API,HERMES
API文档 : getNewsPopularity--新闻热度信息表
数据来源 : 通联数据
DATA_SOURCE_EN : Datayes
字段信息 :
序号 字段名 中文名称 FULL_NAME_EN 数据类型 可空 字段描述 DESCRIPTION_EN 参数值 备注
1 ID 自增ID ID bigint
2 NEWS_ID 新闻ID News ID bigint Can be associated with original news database table. 新闻ID(NEWS_ID)可与普通新闻主表(vnews_content_v1)的新闻ID(NEWS_ID)字段关联。
3 NEWS_URL 新闻网址 News Source Url nvarchar(2000)
4 NEWS_TITLE 新闻标题 News Title nvarchar(1200)
5 POPULARITY 新闻热度 News Popularity decimal(24,20) 根据去重后的新闻阅读数、参与数、回复数和转载数计算得出,每2分钟刷新一次。 Calculated based on the number of news reading, participation, replies and reprints, refreshed every 2 minutes.
6 GLOBAL_POPULARITY 全局新闻热度 News Clobal Popularity decimal(24,20) 在新闻热度的基础上根据发布时间做时间衰减后的结果 Based on news popularity, the value decayed by the release time.
7 EFFECTIVE_TIME 新闻有效时间 News Effective Time datetime If the news release time is not the same day as the current time, and the release time is more than 12 hours from the current time, PUBLISH_Time of news beused;in other cases, the first crawling time of news is used.
8 UPDATE_TIME 更新时间 Update Time datetime
1.3 政策分类表 (news_policy_classification_v1)
FULL_NAME_EN News Policy Classification
描述 : 记录政策类新闻分类,包含申万一级行业、宏观、非政策类共33个类别。
DESCRIPTION_EN : Record the classification of policy news, including 33 categories of Shenyi industry, macro and non policy.
唯一键 : NEWS_ID
数据起始时间 : 2016-01-01
更新频率 : 实时
数据更新时间 : 不定时
数据调用方式 : API,HERMES
API文档 : getNewsPlcyClassifV1--政策分类表
数据来源 : 通联数据
DATA_SOURCE_EN : Datayes
字段信息 :
序号 字段名 中文名称 FULL_NAME_EN 数据类型 可空 字段描述 DESCRIPTION_EN 参数值 备注
1 ID 自增ID ID bigint 信息编码 information encoding
2 NEWS_ID 新闻ID News ID bigint 新闻信息编码 News information code
3 NEWS_INDUSTRY_NAME 行业分类名 News Industry Name varchar(30) 申万一级行业、宏观、非政策共计33个类别。 There are 33 categories of industries, macro policies and non policies at the emergency level.
4 NEWS_SITE_NAME 抓取网站名 News Site Name varchar(50) 新闻发布来源,即新闻的实际爬取来源
,主要包含政府官网以及部委等网站。
News release source, i.e. the actual crawling source of news, mainly including government official websites and websites of ministries and commissions.
5 SOURCE_NAME 新闻原始出处 Source Name varchar(50) 新闻初始来源,即新闻原始出处 The original source of news
6 NEWS_EFFECTIVE_TIME 新闻发布时间 News Effective Time datetime 抓取网站公布时间,取自news_metadata 的news_publish_time Grab the website publishing time from news_ Metadata news_ publish_ time
7 UPDATE_TIME 更新时间 Update Time datetime 数据更新时间 Update time of data
1.4 新闻关键词表 (news_keywords)
FULL_NAME_EN News Keywords
描述 : 记录所有类型新闻关键词标签信息,包含去重后的NER标签和关键词两类别。NER标签包括人名实体标签、产品名实体标签、公司名实体标签等。
DESCRIPTION_EN : Record keywords tag in news , including ner tag and keywords after de duplication. Ner label includes person name entity label, product name entity label, company name entity label, etc.
唯一键 : NEWS_ID
数据起始时间 : 2016-01-01
更新频率 : 实时
数据更新时间 : 不定时
数据调用方式 : API,HERMES
API文档 : getNewsKeywords--获取新闻关键词
数据来源 : 通联数据
DATA_SOURCE_EN : Datayes
字段信息 :
序号 字段名 中文名称 FULL_NAME_EN 数据类型 可空 字段描述 DESCRIPTION_EN 参数值 备注
1 ID 自增ID ID bigint
2 NEWS_ID 新闻ID News ID bigint Can be associated with original news database table. 新闻ID(NEWS_ID)可与普通新闻主表(vnews_content_v1)的新闻ID(NEWS_ID)字段关联。
3 NEWS_KEYWORDS 新闻关键词 News Keywords varchar(500)
4 NEWS_KEYPHRASE 新闻关键词组 News Keyphrase varchar(500)
5 NEWS_PERSON_ENTITY 新闻人名实体标签 News Person Entity varchar(500)
6 NEWS_PRODUCT_ENTITY 新闻产品名实体标签 News Product Entity varchar(500)
7 NEWS_COMPANY_ENTITY 新闻公司名实体标签 News Company Entity varchar(500)
8 NEWS_FUND_ENTITY 新闻基金名实体标签 News Fund Entity Label text JSONArray格式。fundAbbrName-基金简称,fundID-基金ID,fundName-基金名称,parentAbbrName-主基金简称,parentName-主基金全称,secID-通联编制的证券唯一编码,tickerSymbol-交易代码。 Jsonarray format
9 UPDATE_TIME 更新时间 Update Time datetime
1.5 新闻证券关联表 (news_security_score)
FULL_NAME_EN News product association table
描述 : 记录新闻和期货、基金、债券、股票产品的关联关系,包含关联等级、关联等分、股票交易代码和产品内部代码等(该表供展示类的场景使用,股票量化类客户请使用news_company_score表)。
DESCRIPTION_EN : Record the association relationship between news and futures, funds, bonds and stock products, including Association grade, Association grade, stock trading code and product internal code.
唯一键 : NEWS_ID,SECURITY_INT_ID
数据起始时间 : 2020-01-01
更新频率 : 实时
数据更新时间 : 不定时
数据调用方式 : API,HERMES
API文档 : getNewsSecurityScore--新闻证券关联表
数据来源 : 通联数据
DATA_SOURCE_EN : Datayes
字段信息 :
序号 字段名 中文名称 FULL_NAME_EN 数据类型 可空 字段描述 DESCRIPTION_EN 参数值 备注
1 ID 自增ID ID bigint
2 NEWS_ID 新闻id News ID bigint
3 SECURITY_INT_ID 证券内部id Security ID bigint 与证券主表SECURITY_ID关联 证券内部id(SECURITY_INT_ID)可与证券主表(md_security)的证券内部ID(SECURITY_ID)字段关联。
4 SEC_SHORT_NAME 证券简称 Securities Short Name varchar(50)
5 ASSET_CLASS 证券类型 Asset Class varchar(10) E:股票,B:债券,F:基金,FU:期货 E: Stocks, B: Bonds, F: Funds, FU: Futures
6 TICKER_SYMBOL 交易代码 Ticker Symbol varchar(20)
7 PARTY_ID 机构内部id Company ID bigint 证券对应的机构id 机构内部id(PARTY_ID)可与机构主表(md_institution)的机构内部ID(PARTY_ID)字段关联。
8 RELATED_DEGREE 关联等级 Related Degree tinyint 关联等级,0不关联,1弱关联,2强关联 Association level, 0 is not associated, 1 is weakly associated, 2 is strongly associated
9 RELATED_SCORE 关联得分 Related Score double 关联程度得分,取值为0-1的连续值。得分越高关联等级越高。利用强关联和弱关联的概率做加权求和得出的。 Relevance score, a continuous value from 0 to 1. The higher the score, the higher the association level. It is obtained by weighted summation of the probabilities of strong and weak associations.
10 DEGREE_PROP_1ST 关联等级为1的置信度 Confidence Interval for Related Degree 1st double 分类为关联等级1的概率。 The probability of being classified as association level 1.
11 DEGREE_PROP_2ED 关联等级为2的置信度 Confidence Interval for Related Degree 2nd double 分类为关联等级2的概率。 The probability of being classified as association level 2.
12 EFFECTIVE_TIME 新闻有效发布时间 Effective Time datetime
13 INSERT_TIME 入库时间 Insert Time datetime
14 UPDATE_TIME 更新时间 Update Time datetime
1.6 新闻证券关联历史表 (news_sec_score_his)
FULL_NAME_EN Supplementary Table for News Product Association History
描述 : 记录新闻和期货、基金、债券、股票产品的关联关系,包含关联等级、关联等分、股票交易代码和产品内部代码等(该表数据区间:20190101-20200331)。
DESCRIPTION_EN : Record the association relationship between news and futures, funds, bonds, and stock products, including association level, association score, stock trading code, and product internal code (data range: 20190101-20200331).
唯一键 : NEWS_ID,SECURITY_INT_ID
数据起始时间 : 2019-01-01
更新频率 : 不定期
数据更新时间 : 不定时
数据调用方式 : API,HERMES
API文档 : getNewsSecScoreHis--新闻证券关联历史表
数据来源 : 通联数据
DATA_SOURCE_EN : Datayes
字段信息 :
序号 字段名 中文名称 FULL_NAME_EN 数据类型 可空 字段描述 DESCRIPTION_EN 参数值 备注
1 ID 自增ID ID bigint
2 NEWS_ID 新闻id News ID bigint
3 SECURITY_INT_ID 证券内部id Security ID bigint 与证券主表SECURITY_ID关联 Securities Master Table_ ID association 证券内部id(SECURITY_INT_ID)可与证券主表(md_security)的证券内部ID(SECURITY_ID)字段关联。
4 SEC_SHORT_NAME 证券简称 Security Short Name varchar(50)
5 ASSET_CLASS 证券类型 Asset Class varchar(10) E:股票,B:债券,F:基金,FU:期货 E: Stocks, B: Bonds, F: Funds, FU: Futures
6 TICKER_SYMBOL 交易代码 Ticker Symbol varchar(20)
7 PARTY_ID 机构内部id Company ID bigint 证券对应的机构id Institution ID corresponding to securities 机构内部id(PARTY_ID)可与机构主表(md_institution)的机构内部ID(PARTY_ID)字段关联。
8 RELATED_DEGREE 关联等级 Related Degree tinyint 关联等级,0不关联,1弱关联,2强关联 Association level, 0 is not associated, 1 is weakly associated, 2 is strongly associated
9 RELATED_SCORE 关联得分 Related Score double 关联程度得分,取值为0-1的连续值。得分越高关联等级越高。利用强关联和弱关联的概率做加权求和得出的。 Relevance score, a continuous value from 0 to 1. The higher the score, the higher the association level. It is obtained by weighted summation of the probabilities of strong and weak associations.
10 DEGREE_PROP_1ST 关联等级为1的置信度 Confidence Interval for Related Degree 1st double 分类为关联等级1的概率。 The probability of being classified as association level 1.
11 DEGREE_PROP_2ED 关联等级为2的置信度 Confidence Interval for Related Degree 2nd double 分类为关联等级2的概率。 The probability of being classified as association level 2.
12 EFFECTIVE_TIME 新闻有效发布时间 News Effective Time datetime
13 INSERT_TIME 入库时间 Insert Time datetime
14 UPDATE_TIME 更新时间 Update Time datetime
1.7 新闻行业分类表 (news_industry_v2)
FULL_NAME_EN Classification Table of News Industry
描述 : 记录所有类型新闻与2014版申万行业关联信息,包括申万一级、申万二级、申万三级行业。
DESCRIPTION_EN : Record all types of news and industry related information for 2014 Edition Shenwan Sector, including Shenwan Level 1, Shenwan Level 2, and Shenwan Level 3 industries.
唯一键 : NEWS_ID,INDUSTRY_NAME_3RD
数据起始时间 : 2016-01-01
更新频率 : 实时
数据更新时间 : 不定时
数据调用方式 : Hermes,API
API文档 : getNewsIndustryV2--新闻行业分类表
数据来源 : 通联数据
DATA_SOURCE_EN : Datayes
字段信息 :
序号 字段名 中文名称 FULL_NAME_EN 数据类型 可空 字段描述 DESCRIPTION_EN 参数值 备注
1 ID 信息编码 ID bigint
2 NEWS_ID 新闻ID News ID bigint
3 EFFECTIVE_TIME 新闻有效发布时间 Effective Time datetime 新闻有效发布时间,若新闻发布时间和当前时间不是同一天, 且发布时间早于当前时间超过12小时,采用NEWS_PUBLISH_TIME;其他情况下采用新闻首次爬取时间。 If the news release time is not the same day as the current time, and the release time is more than 12 hours from the current time, PUBLISH_Time of news be used; in other cases, the first crawling time of news is used.
4 INDUSTRY_NAME_1ST 行业一级分类 First Level Industry varchar(30) 申万一级行业 Shenwan First Industry
5 INDUSTRY_NAME_2ND 行业二级分类 Second Level Industry varchar(30) 申万二级行业 Shenwan Secondary Industry
6 INDUSTRY_NAME_3RD 行业三级分类 Third Level Industry varchar(30) 申万三级行业 Shenwan Third Industry
7 UPDATE_TIME 更新时间 Update Time datetime