期刊文献+

概率主题模型综述 被引量:66

A Survey on Probabilistic Topic Model
在线阅读 下载PDF
导出
摘要 主题模型是当下文本挖掘中最主要的技术之一,广泛应用于数据挖掘、文本分类以及社区发现等.由于其出色的降维能力和灵活的易扩展性,成为自然语言处理领域的一个热门研究方向.Blei等人提出了以Latent Dirichlet Allocation(LDA)为代表的概率主题建模方法,在该模型中主题可以看作是单词的概率分布,主题模型通过单词项在文档级的共现信息提取出与文档语义相关的主题,实现将高维的单词空间映射到低维的主题空间,进而完成对目标文本数据的降维处理,开创了文本挖掘研究的新方向.其中LDA作为一种概率生成模型很容易被扩展为其它各种形式的模型,鉴于概率主题模型的应用价值、理论意义和未来的发展潜力,本文首先系统性地对LDA模型进行介绍,进而对基于LDA模型的各类扩展模型进行详细分类,并对其中各类的典型代表进行详细介绍,指出了各个概率主题模型被提出的原因以及其模型的具体形式、所具有的优缺点、适宜解决的问题等,进而又指出近年来主题模型典型应用场景;此外,本文还对目前概率主题模型常用的几个公认的数据集、评测方法以及典型实验结果进行详细介绍,并在最后指明了概率主题模型在进一步研究中需要解决的问题以及未来可能的发展方向. Topic model is one of the most important techniques in text mining,which is widely used in data mining,text classification and community discovery.Topic model has become a hot direction in the field of natural language processing because of the excellent ability of the dimensionality reduction and the flexible ability to construct other probilistic models.Blei et al proposed LDA which is known as the most typical topic model.In this model,a topic is regarded as probabilistic distribution of words.Topic models extract semantic topics using co-occurrence of terms in document level,and are used to map high-dimensional word vectors to low-dimensional topic spaces,obtaining the low dimensional representation of documents.Topic models create a new direction of texting processing for data mining.As a probabilistic generative model,LDA can be easily extended to other models.Therefore,in view of the application value,theoretical significance and future development potential of probabilistic topic model,firstly,this paper systematically introduces the LDA model,making a particular categorization on topic models derived from LDA,and then points the motivation of every topic model,the advantages of every topic model,the problems that every topic model can solve,the form of every topic model,and the typical application scenarios that topic models can be used.In addition,several common datasets,evaluation metrics and typical experimental results of probability topic models are introduced in detail.Finally,we reveal the problems and the research directions of the probabilistic topic models in the future.
作者 韩亚楠 刘建伟 罗雄麟 HAN Ya-Nan;LIU Jian-Wei;LUO Xiong-Lin(Department of Automation,China University of Petroleum,Beijing 102249)
出处 《计算机学报》 EI CAS CSCD 北大核心 2021年第6期1095-1139,共45页 Chinese Journal of Computers
基金 supported by the Science Foundation of China University of Petroleum,Beijing (No.2462020YXZZ023)。
关键词 主题模型 文本挖掘 LDA 高维数据 自然语言处理 topic model texting mining Latent Dirichlet Allocation(LDA) high-dimensional data natural language processing
作者简介 韩亚楠,博士研究生,主要研究方向为机器学习.E-mail:857182813@qq.com;刘建伟,博士,副教授,主要研究方向为机器学习、模式识别与智能系统、复杂系统的分析、预测与控制、算法分析与设计;罗雄麟,博士,教授,主要研究领域为智能控制、复杂系统分析、预测与控制.
  • 相关文献

参考文献8

二级参考文献103

  • 1Deerwester S C, Dumais S T, Landauer T K, et al. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990.
  • 2Hofmann T. Probabilistic latent semantic indexing//Proceedings of the 22nd Annual International SIGIR Conference. New York: ACM Press, 1999:50-57.
  • 3Blei D, Ng A, Jordan M. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993-1022.
  • 4Griffiths T L, Steyvers M. Finding scientific topics//Proceedings of the National Academy of Sciences, 2004, 101: 5228 5235.
  • 5Steyvers M, Gritfiths T. Probabilistic topic models. Latent Semantic Analysis= A Road to Meaning. Laurence Erlbaum, 2006.
  • 6Teh Y W, Jordan M I, Beal M J, Blei D M. Hierarchical dirichlet processes. Technical Report 653. UC Berkeley Statistics, 2004.
  • 7Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 1977, B39(1): 1-38.
  • 8Bishop C M. Pattern Recognition and Machine Learning. New York, USA: Springer, 2006.
  • 9Roweis S. EM algorithms for PCA and SPCA//Advances in Neural Information Processing Systems. Cambridge, MA, USA: The MIT Press, 1998, 10.
  • 10Hofmann T. Probabilistic latent semantic analysis//Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. Stockholm, Sweden, 1999:289- 296.

共引文献348

同被引文献1077

引证文献66

二级引证文献220

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部