期刊文献+

中文微博突发事件检测研究 被引量:24

Research on Chinese Micro-blog Bursty Topics Detection
原文传递
导出
摘要 从微博中准确而高效地挖掘出突发事件是近年来的研究热点。通过词频统计、词增长率计算和TF-PDF算法抽取突发词集,使用突发词表示文本并结合微博突发事件的描述特征进行文本过滤;提出一种"绝对聚类"算法,对描述突发事件的文本进行聚类,并通过微博的回复数和转发数加权计算热度,检测各类事件中热度最大的作为突发事件。检测准确率为92.60%,召回率为85.51%,F值为0.89。实验结果表明,相比于传统的突发事件检测方法,该方法能够比较准确地检测到微博中的突发事件,有一定的应用价值。 Much attention is paid to mining bursty topics accurately and efficiently from micro -blog nowadays. In this paper, a set of burst terms are extracted by counting the term frequency, calculating the growth rate of the terms and using Term Frequency - Proportional Document Frequency ( TF - PDF) algorithm to measure the weight. And then micro - blog texts are described with the burst terms. Analyzing the characteristic that bursty topics propagate in the platform of micro -blog, the authors filter the texts that do not contribute to detect bursty topics. The paper proposes a novel clustering strategy of "Absolute Clustering" to cluster the micro- blog texts. By figuring up the hot spot of the texts with weighted value of reply and retweet number, the top 5 texts are extracted as the result of burst topics detection. The experiments show that the precision is 92.60% , the recall is 85.51% and the F - measure is 0.89. Contrast with the traditional meth- od, the validity of the proposed method is proved.
出处 《现代图书情报技术》 CSSCI 北大核心 2013年第2期57-62,共6页 New Technology of Library and Information Service
基金 国家自然科学基金项目"基于本体的专利自动标引研究"(项目编号:61271304) 国家自然科学基金项目"网页内容真实性评价研究"(项目编号:61171159) 北京市教委科技发展计划重点项目暨北京市自然科学基金B类重点项目"面向领域的互联网多模态信息精准搜索方法研究"(项目编号:KZ201311232037) 国家科技支撑计划课题"增强型搜索引擎关键技术研究与示范"(项目编号:2011BAH11B03)的研究成果之一
关键词 突发事件 突发词 文本过滤 绝对聚类 Bursty topics Burst terms Filter Absolute clustering
作者简介 E—mail:wy514674793@126.com
  • 相关文献

参考文献16

  • 1中国互联网信息中心.第30次中国互联网络发展状况统计报告[R].2012.
  • 2原福永,冯静,符茜茜.微博用户的影响力指数模型[J].现代图书情报技术,2012(6):60-64. 被引量:72
  • 3Diao Q M, Jiang J, Zhu F D. Finding Bursty Topics from Microblogs[C].In: Proceedings of ACL, 2012:536-544.
  • 4Wang X H, Zhai C X, Hu X,et al. Mining Correlated Bursty Topics Patterns from Coordinated Text Streams[C]. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'07), California, USA. New York, NY, USA:ACM,2007:784-793.
  • 5Du Y Y, He Y X, Tian Y,et al. Microblog Bursty Topic Detection Based on User Relationship[C]. In: Proceedings of the 6th IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC). 2011:260-263.
  • 6Du Y Y, Wu W, He Y X,et al. Microblog Bursty Feature Detection Based on Dynamics Model[C]. In: Proceedings of the International Conference on Systems and Informatics(ICSAI). 2012:2304-2308.
  • 7Fung G P C, Yu J X, Yu P S,et al. Parameter Free Bursty Events Detection in Text Streams[C].In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005:181-192.
  • 8Erdmann M, Nakayama K, Hara T,et al. Improving the Extraction of Bilingual Terminology from Wikipedia[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2009, 5(4):1-17.
  • 9Bollegala D, Matsuo Y, Ishizuka M. Measuring the Similarity Between Implicit Semantic Relation Using Web Search Engines[C].In: Proceedings of the 2nd ACM International Conference on Web Search and Data Mining(WSDM'09). New York, NY, USA: ACM, 2009:104-113.
  • 10李海芳,史俊冰,段利国,陈俊杰.一种基于含糊同义词的查询扩展方法[J].计算机应用与软件,2011,28(12):41-43. 被引量:6

二级参考文献91

共引文献238

同被引文献332

引证文献24

二级引证文献124

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部