期刊文献+

LDA模型在话题追踪中的应用 被引量:27

Use of LDA Model in Topic Tracking
在线阅读 下载PDF
导出
摘要 随着对LDA模型的研究越来越深入,文本表示和挖掘能力进一步提高。"话题"是LDA模型中一个非常重要的概念,是特征集合的一个多项式概率分布。话题追踪是根据少数已知相关信息在未知报道流中追踪一个话题,找出与该话题相关的所有报道。把LDA模型用于话题追踪,目的有两个:(一)检验LDA话题对追踪话题的表示能力;(二)检验LDA模型在挖掘训练数据中的追踪话题时,LDA话题和追踪话题之间的关系。实验表明:相对于经典的向量空间模型和一元语言模型,以及专门针对追踪话题提出的事件模型,基于LDA模型的追踪性能更好,但由于粒度不同,LDA模型中的话题和追踪话题并没有直接的一一对应的关系,实现可定制话题的LDA模型是下一步工作的目标。 As more and more researches are made for the LDA model,its ability of representing and mining has been increased a lot."Topic" is an important concept in the LDA model,which is represented as a polynomial distribution of the feature set.Topic tracking is monitoring a stream of news stories to find additional stories on a topic identified by several samples.There are two reasons for using the LDA model in topic tracking:one is to show how the performance of the tracking system using the LDA model is;the other is trying to find whether there is some relation between the LDA topic and the tracked topic.The experimental results indicate that the LDA model is better than the vector space model,the unigram language model and the special event model in a topic tracking system.However,since the granularities of two kinds of topics are different,the relation between the LDA topic and the tracked topic is not about bijection.An adjustable LDA model is needed in our future work.
出处 《计算机科学》 CSCD 北大核心 2011年第B10期136-139,152,共5页 Computer Science
基金 国家自然科学基金(60873097 60933005)资助
关键词 LDA模型 话题追踪 话题 LDA model Topic tracking Topic
作者简介 张晓艳(1981-),女,博士,讲师,主要研究方向为自然语言处理、话题发现与追踪,E-mail:zhangxiaoyan@nudt.edu.cn; 王挺(1970-),男,博士,教授,博士生导师,主要研究方向为自然语言处理、计算机软件; 梁晓波(1969-),男,博士,教授,硕士生导师,主要研究方向为语料库语言学、认知语言学。
  • 相关文献

参考文献15

  • 1张晓艳,王挺.话题发现与追踪技术研究[J].计算机科学与探索,2009,3(4):347-357. 被引量:21
  • 2Blei D M,Ng A Y,Jordan M I. Latent Diriehlet Allocation[J]. Journal of Machine Learning Research, 2003(3) :993-1022.
  • 3Zhang Xiao-yan, Wang Ting. Topic Tracking with Improved Representation Model and Joint Tracking Method[J]. International Journal of Wavelets, Multiresolution and Information Processing, 2010,8(6) : 913-930.
  • 4Zhang Xiao-yan, Wang Ting, Chen Huo-wang. Story Link Detection based on Event Model with Uneven SVM[C] //Fourth Asia Information Retrieval Symposium (AIRS' 08). Harbin, China, Springer-Verlag, 2008: 436-441.
  • 5Eichmann D. Link Detection[R]. Iowa City: School of Library and Information Science, the University of Iowa, 2004.
  • 6Ogilvie P. Extracting and Using Relationships Found in Text for Topic Tracking[R]. Pittsburgh,Pennsylvania, USA, 2000.
  • 7Nallapati R. Semantic Language Models for Topic Detection and Tracking[C]//Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Proceedings of the HLT- NAACL 2003 Student Research Workshop. Edmonton, Canada, Association for Computational Linguistics, 2003:1-6.
  • 8Connell M,Feng A,Kurnaran G,et al. UMass at TDT 2004[C]//Proceedings of the TDT2004 Workshop. 2004.
  • 9Nigam K, McCallum A, Thrun S, et al.. Text classification from labeled and unlabeled documents using EM[J]. Machine Learning, 2000,39 (2/3) : 103-134.
  • 10Hofmann T. Probabilistic latent semantic indexing [C]// Proceedings of the Twenty-Second Annual International SIGIR Conferenc, 1999.

二级参考文献19

  • 1YE Hui-min,CHENG Wei,DAI Guan-zhong.Design and Implementation of On-Line Hot Topic Discovery Model[J].Wuhan University Journal of Natural Sciences,2006,11(1):21-26. 被引量:14
  • 2于满泉,骆卫华,许洪波,白硕.话题识别与跟踪中的层次化话题识别技术研究[J].计算机研究与发展,2006,43(3):489-495. 被引量:49
  • 3赵华,赵铁军,于浩,郑德权.基于查询向量的英语话题跟踪研究[J].计算机研究与发展,2007,44(8):1412-1417. 被引量:8
  • 4Blei D, Ng A, Jordan M. Latent dirichlet allocation. Journal of Machine Learning Research, 2003, 3:993-1022
  • 5Blei D, Lafferty J. Correlated topic models//Weiss Y, Seholkopf B, Platt J eds. Advances in Neural Information Processing Systems 18. Cambridge, MA: MIT Press, 2006
  • 6Li W, McCallum A. Pachinko allocation: DAG-struetured mixture models of topic correlations//Proceedings of the International Conference on Machine Learning (ICML). Pittsburgh, Pennsylvania, 2006: 577-584
  • 7Xing E, Yan R, Hauptmann A. Mining associated text and images with dual-wing harmoniums//Proceedings of the 21th Annual Conference on Uncertainty in Artificial Intelligence (UAI-05). Edinburgh, Scotland, 2005:633-641
  • 8Li F-F, Perona P. A bayesian hierarchical model for learning natural scene categories//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). Washington, DC, USA, 2005: 524-531
  • 9Wei X, Croft W B. LDA-based document models for ad-hoc retrieval/ /Proceedings of the 29th SIGIR Conference. 2006: 178-185
  • 10Deerwester S, Dumais S, Furnas G, Lanouauer T, Harshman R. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990, 41: 391- 407

共引文献103

同被引文献330

引证文献27

二级引证文献196

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部