摘要
文本主题检测可以很好的挖掘海量信息中的关键因子,本文主要通过基于共词分析方法对文本主题词进行聚类从而发现当前的主题,首先通过停用词过滤和TF-IDF关键词提取技术提取出主题词串,然后构建共词矩阵,最后通过Bisecting K-means算法对主题词串进行聚类分析,从而发现主题。实验结果表明,该方法对热点主题提取有一定的效果。
Text topic detection can detect the most important aspects of the vast information,This article clusters the subject terms based on the method of analysing common words,and then finds the current theme.Firstly,We extracted
出处
《情报科学》
CSSCI
北大核心
2011年第11期1621-1624,共4页
Information Science
基金
浙江省自然科学基金资助项目(Y1100176)
关键词
共词分析
TF-IDF
共词矩阵
Bisecting
K-MEANS
主题
string
by filting the stop words and TF-IDF keywords extraction technique
next
we constructed the Co-word matrix.Last
we analysed keywords string in clustering through Bisecting K-means algorithm to find the theme.Experimental results show that this method is of hot subject extraction. Keywords:co-word analysis
TF-IDF
co-word matrix
bisecting
k-means
theme
作者简介
王小华(1961-,男,杭州人,教授,主要从事中文信息处理、数据挖掘、人工智能及应用等研究.