摘要
[目的/意义]针对能源政策语义信息丰富的特点,研究不同环境下卷积神经网络模型对能源政策文本特征分类识别的效果并提出优化方法,辅助能源政策信息资源进行自动分类操作,方便研究人员更好地进行能源政策解读。[方法/过程]在不同环境下利用字符级和词级卷积神经网络模型对能源政策自动文本分类识别效果进行实验,从标题、内容、核心主题句等角度全面对比分析,利用Doc2Vec抽取不同比例核心主题句,将这些主题信息融入卷积神经网络模型中以对实验进行优化。[结果/结论]随着核心主题句抽取率的提高F1均值呈正态分布,当抽取率为70%时达到平衡,神经网络模型评估F1均值为83.45%,较实验中的其它方法均有所提高,通过Doc2Vec提取主题信息,并将其融入卷积神经网络的方法有效提升了卷积神经网络模型自动文本分类的效果。
[Purpose/Significance]In view of the rich semantic information of energy policy,this paper studies the effect of convolutional neural network model on energy policy text feature classification and recognition under different environments,and proposes optimization methods to assist the automatic classification operation of energy policy information resources,so as to facilitate researchers to better interpret energy policy.[Method/Process]This paper used character-level and word-level convolution neural network model to test the effect of automatic classification and recognition of energy policy texts in different environments.It made a comprehensive comparative analysis from the perspective of title,content and core topic sentences,and extracted different proportion of core topic sentences by Doc2Vec for optimization experiments.[Result/Conclusion]With the increase of the extraction rate of core topic sentences,the average value of F1 was normal distribution.When the extraction rate is 70%,the balance was reached.The average value of F1 evaluated by the neural network model was 83.45%.Compared with other methods in the experiment,the using of topic information which was extracted with Doc2Vec effectively improved the automatic classification effect of the convolutional neural network model.
作者
杨锐
陈伟
何涛
张敏
李蕊伶
岳芳
Yang Rui;Chen Wei;He Tao;Zhang Min;Li Ruiling;Yue Fang(Wuhan Library,Chinese Academy of Sciences,Wuhan 430074,China;Key Laboratory of Science and Technology of Hubei Province,Wuhan 430074,China;School of Economics and Management,University of Chinese Academy of Sciences,Beijing 100190,China;Department of Information Security,Naval University of Engineering,Wuhan 430033,China)
出处
《现代情报》
CSSCI
2020年第4期42-49,共8页
Journal of Modern Information
基金
中国科学院文献情报能力建设专项“文献情报’数据湖’及开放式大数据框架建设”(项目编号:院1852)
中国科学院战略研究和决策支持系统建设专项(项目编号:GHJ-ZLZX-2019-35)
中国科学院青年创新促进会项目(项目编号:2017221)
中国科学院变革性洁净能源关键技术与示范战略性先导科技专项战略研究课题(项目编号:XDA21010100)
中国科学院文献情报能力建设专项经费(项目编号:Y9290001)
作者简介
杨锐(1977-),男,副研究馆员,研究方向:数据挖掘、语义分析技术;陈伟(1981-),男,研究馆员,研究方向:能源科技发展战略与政策、情报分析理论、方法与技术;何涛(1981-),男,讲师,研究方向:自然语言处理、数据挖掘技术;张敏(1985-),女,助理研究员,研究方向:自然语言处理;李蕊伶(1995-),女,实习研究员,研究方向:信息技术处理;岳芳(1981-),女,助理研究员,研究方向:能源科技战略情报。