摘要
在分析传统短文本主题词提取算法的基础上,综合考虑个人微博的非主流文本特征,提出一种个人微博主题词提取算法PWSWE(Personal weibo subject word extraction algorithm)。该算法采用增量式的提取模式,首先引进由微博转帖、评论和赞数组成的流行度概念;其次对耦合、时序和流行度进行串行相似度计算;再次针对关键词特征值离散现象,对传统TF-IDF函数进行改进;最后综合以上提取结果并进行相应地处理得到最终的主题词。实验结果证明该算法提取的主题词具有较高的准确率和覆盖率。
Based on analysing traditional algorithm of short-text subject words extraction and considering comprehensively the non-main- stream text characteristics of personal microblogging, the paper puts forward a new personal microblogging subject words extraction algorithm (PWSWE). The algorithm uses the incremental extraction pattern, first it introduces popularity concept consisting of the forwarded micro- posts, the comments and the praise numbers. Then it calculates the serial similarities of coupling, timing and popularity. After that it im- proves the traditional TF-IDF function targeted at the phenomenon of keywords eigenvalue discretion. At last the final subject words are de- rived by integrating the extraction results above and processing accordingly. Experimental result shows that the subject words extracted by this algorithm have higher accuracy and coverage.
出处
《计算机应用与软件》
CSCD
2015年第7期86-89,共4页
Computer Applications and Software
关键词
个人微博
主题词
PWSWE
Personal microblogging Subject word PWSWE
作者简介
商永兵,副教授,主研领域:Web技术。
周环宇,硕士生。
聂知秘,硕士生。
胡文江,教授。