期刊文献+

基于内容分析的话题检测研究 被引量:20

Topic detection research based on content analysis
在线阅读 下载PDF
导出
摘要 通过分析大量的英文报道的特点,针对目前话题检测研究中存在的难以区分两次不同的火车事故或爆炸事件的问题提出了基于内容分析的话题检测算法.该算法以S ingle-Pass聚类策略为基础,通过内容分析将话题表示成两个中心向量:标识中心向量及内容中心向量.实验证明基于内容分析的话题检测算法不但简单易行,而且对于解决上述的“难以区分”问题非常有效. Based on the analysis of lots of English stories, we propose a Content Analysis - based topic detection algorithm, which aims to solve the problem existing in the topic detection research, which is difficult to detect two distinct train or explosion accidents as different events. Based on Single-Pass clustering technique, using Content Analysis, the algorithm expresses topics as two centroids: identifier centroid and content centroid. Experiment results prove that Content Analysis-based topic detection algorithm is not only easy, but also effective on solving the dilficuh-to-distinguish problem.
出处 《哈尔滨工业大学学报》 EI CAS CSCD 北大核心 2006年第10期1740-1743,共4页 Journal of Harbin Institute of Technology
基金 国家自然科学基金资助项目(60302021) 国家863高科技项目基金资助项目(2004AA117010-08)
关键词 话题检测 内容分析 错误检测开销 标识词 内容词 topic detection content analysis detection error cost identifier word content word
作者简介 赵华(1980-),女,博士研究生;E-mail:huazhao@mtlab.hit.edu.cn 赵铁军(1962-),男,博士,教授,博士生导师.
  • 相关文献

参考文献7

  • 1ALLAN J,CARBONELL J.Topic Detection and Tracking Pilot Study:Final Report[A].Proceeding of the DARPA Broadcast News Transcriptions and Understanding Workshop[C].1998.
  • 2ALLAN J,LAVENKO V.UMass at TDT 2000.Available at http://www.nist.gov/speech/tests/tdt/tdt2000/papers.htm,2000.
  • 3WALLS F,JIN H,SISTA S,.et al.Topic Detection in Broadcast News[A].Proceedings of the DARPA Broadcast News Workshop[C].Herndon,1999.
  • 4贾自艳,何清,张海俊,李嘉佑,史忠植.一种基于动态进化模型的事件探测和追踪算法[J].计算机研究与发展,2004,41(7):1273-1280. 被引量:59
  • 5MAKKONEN J,AHONEN-MYKA H,SALMENKIVI M.Applying Semantic Classes in Event Detection and Tracking[A].Proceedings of International Conference on Natural Language Processing[C].Mumbai,India,2002.
  • 6STRASSEL S,GRAFF D,MARTEY N.Quality Control in Large Annotation Projects Involving Multiple Judges:The Case of the TDT Corpora[A].Proceedings of the Second International Language Resources and Evaluation Conference[C].Athens,Greece,2000.
  • 7The 2003 Topic Detection and Tracking (TDT2003)Task Definition and Evaluation Plan.Available at http://www.nist.gov/speech/tests/tdt/tdt2003/evalplan.htm,April,2003.

二级参考文献7

  • 1R Papka.On-line new event detection,clustering,and tracking:[Ph D dissertation].MA:University of Massachusetts Amherst,1999
  • 2K Hui,W Lam.Automatic event generation from multi-lingual news stories.In:Proc of the First ACM/IEEE-CS Joint Conf on Digital Libraries.Roanoke,New York:ACM Press,2001.23~24
  • 3N Stokes,J Carthy,A F Smeaton.Segmenting broadcast news streams using lexical chaining.In:T Vidal,P Liberatore,eds.Proc of STAIRS 2002.Amsterdam:IOS Press,2002.145~154
  • 4D Randall.The Universal Journalist,Second Edition.London:Pluto Press,2000
  • 5S H Lin,M C Chen,J M Ho,et al.ACIRD:Intelligent Internet document organization and retrieval.IEEE Trans on Knowledge and Data Engineering,2002,14(3):599~613
  • 6G Salton,B Buckley.Term-weighting approaches in automatic text retrieval.Information Processing and Management,1998,24(5):513~523
  • 7李晓黎,刘继敏,史忠植.基于支持向量机与无监督聚类相结合的中文网页分类器[J].计算机学报,2001,24(1):62-68. 被引量:108

共引文献58

同被引文献253

引证文献20

二级引证文献222

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部