期刊文献+

基于词元再评估的新事件检测模型 被引量:17

A New Event Detection Model Based on Term Reweighting
在线阅读 下载PDF
导出
摘要 新事件检测(new event detection,简称NED)的目标是从一个或多个新闻源中检测出报道一个新闻话题的第一个新闻.初步实验发现,在对不同类别的新闻报道进行新事件检测时,其不同类型的词元往往具有不同的敏感程度.而传统方法往往将所有的词元等同看待.重点研究在新事件检测模型中,对于不同词元的权重设定问题.提出利用统计方法优化不同类别新闻对于不同词性词元的权重参数;提出利用已有新闻簇信息动态更新词元权重的方法,采用在新闻之间(而非新闻与新闻簇之间)计算相似度的形式,发挥两种比较形式的优点.在Linguistic Data Consortium(LDC)公共数据集TDT2与TDT3上进行实验,实验结果表明,这两种改进方法的效果明显,性能与同类系统相比有显著提升. New event detection (NED) is aimed at detecting from one or multiple streams of news stories the one being reported on a new event (i.e. not reported previously). Preliminary experiments show that terms of different types (e.g. Noun and Verb) have different effects for different classes of stories in determining whether or not two stories are on the same topic, Unfortunately, conventional approaches usually ignore the fact. This paper proposes a NED model utilizing two approaches to addressing the problem based on term reweighting. In the first approach, the paper proposes to employ statistics on training data to learn the model for each class of stories, and in the second, the paper proposes to adjust term weights dynamically based on previous story clusters, Experimental results on two linguistic data consortium (LDC) data sets: TDT2 and TDT3 show that both the proposed approaches can effectively improve the performance of NED task, compared to the baseline method and existing methods.
出处 《软件学报》 EI CSCD 北大核心 2008年第4期817-828,共12页 Journal of Software
基金 国家自然科学基金No.90604025~~
关键词 新事件检测 信息检索 命名实体 词元再评估 new event detection information retrieval name entity term reweighting
作者简介 张阔(1981-),男,北京人,博士生,主要研究领域为文本挖掘,信息抽取,信息检索.Corresponding author: Phn: +86-10-62771736, E-mail: zkuo99@mails.tsinghua.edu.cn 李涓子(1964-),女,博士,副教授,CCF高级会员,主要研究领域为语义网,中文信息处理,网络环境下的知识发现和知识管理. 吴刚(1978-),男,博士生,主要研究领域为数据仓库,半结构化数据与Web数据集成,数据挖掘. 王克宏(1942-),男,教授,博士生导师,CCF高级会员,主要研究领域为知识工程,分布式知识处理.
  • 相关文献

参考文献3

二级参考文献21

  • 1R Papka.On-line new event detection,clustering,and tracking:[Ph D dissertation].MA:University of Massachusetts Amherst,1999
  • 2K Hui,W Lam.Automatic event generation from multi-lingual news stories.In:Proc of the First ACM/IEEE-CS Joint Conf on Digital Libraries.Roanoke,New York:ACM Press,2001.23~24
  • 3N Stokes,J Carthy,A F Smeaton.Segmenting broadcast news streams using lexical chaining.In:T Vidal,P Liberatore,eds.Proc of STAIRS 2002.Amsterdam:IOS Press,2002.145~154
  • 4D Randall.The Universal Journalist,Second Edition.London:Pluto Press,2000
  • 5S H Lin,M C Chen,J M Ho,et al.ACIRD:Intelligent Internet document organization and retrieval.IEEE Trans on Knowledge and Data Engineering,2002,14(3):599~613
  • 6G Salton,B Buckley.Term-weighting approaches in automatic text retrieval.Information Processing and Management,1998,24(5):513~523
  • 7Yu S., Bai S., and Wu P. Description of the Kent Ridge Digital Labs System Used for MUC - 7[A]. In: Proceedings of the Seventh Message Understanding Conference[C]. 1998.
  • 8Chen H. , Ding Y., Tsai S., et al. Description of the NTU System Used for MEq2[A]. In: Proceedings of the Seventh Message Understanding Conference[C]. 1998.
  • 9陈群秀.信息处理用信息现代汉语句型系统初步研究[A]..Advances in Computation of Oriental Lauguages[C].北京:清华大学出版社,2003年8月.205-212.
  • 10Ralph Grishman.Information Extraction: Techniques and Challenges[M]. In: Maria Teresa Pazienza, editor, Information Extraction. Springer-Verlag, Lecture Notes in Artificial Intelligence, Rome, 1997.

共引文献99

同被引文献173

引证文献17

二级引证文献78

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部