摘要
针对互联网新闻事件追踪,结合时间信息提出了一种用于事件追踪的动态模型。该模型将时间因素加入到传统向量模型中,在此基础上得到文档与事件包含的相同特征词之间的时间相似度,并将其应用于文档与事件的相关性计算。若文档与事件相关,则把文档中新的特征词加入事件特征词集并重新调整事件特征词集中特征词的权重和时间信息。实验采用检测错误权衡(DET)曲线进行评估,结果显示与传统向量模型相比,用于事件追踪的动态模型有效地提高了系统性能,其最小的归一化追踪损耗代价降低了约9%。
Concerning the Internet news tracking, the study put forward a dynamic model for event tracking with reference to the time information. The dynamic model introduced the time factor into the traditional vector model to get the time similarity of the same characteristic words between the document and the event, and then applied the time similarity to calculate the similarity of the document and the event. If a document was related to the event, the new characteristic words in the document would be added to the event term set, and the weight and time information of characteristic words in the event term set should be re-adjusted. The experiment was evaluated by Detection Error Tradeoff (DET), and the results show that the dynamic model for event tracking improves the system performance effectively, and its minimum normalized cost of tracking loss is reduced by about 9%.
出处
《计算机应用》
CSCD
北大核心
2013年第10期2807-2810,2821,共5页
journal of Computer Applications
基金
中国博士后科学基金资助项目(20070420700)
河北省自然科学基金资助项目(F2011201146)
作者简介
徐建民(1966-),男,河北邯郸人,教授,博士,主要研究方向:信息检索、不确定信息处理;通信作者电子邮箱897610964@qq.com
孙晓磊(1987-),女,河北石家庄人,硕士研究生,主要研究方向;信息检索、话题追踪;
吴树芳(1980-),女,河北邯郸人,博士研究生,主要研究方向:信息检索、话题追踪与检测。