期刊文献+

一种改进的LDA主题模型 被引量:47

An Improved LDA Topic Model
在线阅读 下载PDF
导出
摘要 由于文档中的词符合幂律分布,使得LDA模型的主题分布向高频词倾斜,导致能够代表主题的多数词被少量的高频词淹没使得主题表达能力降低.通过一种高斯函数对特征词加权,改进LDA主题模型的主题分布.实验显示加权LDA模型获得的主题间的相关性以及复杂度(Perplexity)值都降低,说明改进模型在主题表达和预测性能方面都有所提高. The distribution of words in the document satisfy power rules,which cause the topics incline the high frequency words,and then many words which can represent topics are submerged.It leads to reduce the expression capability of LDA topics.An improved LDA topic model is showed by weighting the feature words using Gauss function.The experiments indicate that the weighting topic model is better generalization performance by validating the correlations among the topics and the perplexity value of model.
出处 《北京交通大学学报》 CAS CSCD 北大核心 2010年第2期111-114,共4页 JOURNAL OF BEIJING JIAOTONG UNIVERSITY
基金 国家"973"计划项目资助(2006CB504601) 国家科技支撑计划项目资助(2007BA110B06-01) 国家自然科学基金资助项目(90709006) 北京市科委科研攻关项目资助(D08050703020804) 北京交通大学科技基金资助(2007RC072)
关键词 LDA Dirichlet分布 加权主题模型 latent dirichlet allocation(LDA) dirichlet distribution weighting topic model
作者简介 作者简介:张小平(1969-),女,山西平遥人,副教授,博士生.email:zh_xping@hotmail.com.
  • 相关文献

参考文献14

  • 1Blei D,Ng A,Jordan M.Latent Dirichlet allocation[J].Journal of Machine Learning Research,2003:3,993-1022.
  • 2Griffiths T L,Steyvers M.A Probabilistic Approach to Semantic Representation[C]∥ Proceedings of the 24th Annual Conference of the Cognitive Science Society,2002:381-386.
  • 3Griffiths T L,Steyvers M.Prediction and Semantic Association[C]∥ Advances in Neural Information Processing Systems,2003,15:11-18.
  • 4Griffiths T L,Steyvers M.Finding Scientific Topics[C]∥ Proceedings of the National Academy of Science,2004:5228-5235.
  • 5Hofmann T.Probabilistic Latent Semantic Analysis[C]∥ Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence.1999:289-296.
  • 6Deerwester S,Dumais S,Furnas G,et al.Indexing by Latent Semantic Analysis[J].Journal of the American Society for Information Science,1990,41:391-407.
  • 7Hofmann T.Unsupervised Learning by Probabilistic Latent Semantic Analysis[J].Machine Learning Journal,2001,42(1):177-196.
  • 8Blei D,Lafferty J.Correlated Topic Models[C]∥ Advances in Neural Information Processing Systems,2006,18:147-154.
  • 9Blei D,Griffiths T,Jordan M,et al.Hierarchical Topic Models and the Nested Chinese Restaurant[C]∥ Advances in Neural Information Processing Systems,2004,16:17-24.
  • 10Li W,McCallum A.Pachinko Allocation:DAG-Structured Mixture Models of Topic Correlations[C]∥ 23th International Conference on Machine Learning,2006:577-584.

同被引文献431

引证文献47

二级引证文献192

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部