期刊文献+

基于大数据挖掘技术的文本分类研究 被引量:6

Research on text classification based on big data mining technology
在线阅读 下载PDF
导出
摘要 文本数据具有规模大、特征维数高等特点,当前文本分类方法无法刻画文本变化特点,使得文本分类正确率低、误差大、分类时间长,为了获得理想的文本分类效果,设计基于大数据挖掘技术的文本分类方法。首先对当前文本分类的研究进展进行分析,找出导致当前文本分类效果差的原因;然后,提取文本分类原始特征,并引入核主成分分析算法对原始特征进行处理,降低特征维数,简化文本分类器的结构;最后,采用大数据挖掘技术构建文本分类器,并与其他文本分类方法进行对比测试。测试结果表明,所提方法可以更好地描述文本变化特点,能够对各种类型文本进行准确识别和分类,文本分类精度超过95%,明显高于当前其他文本分类方法,并且所提方法的文本分类时间显著减少,具有更好的文本分类效果。 Text data are of characteristics of large scale and high feature dimension. The current text classification methods fail to depict the characteristics of text change,which results in low accuracy,large error and long duration of the classification.In order to get an ideal text classification effect,a text classification method based on big data mining technology is designed.The current research progress of text classification is analyzed to find out the reasons for the poor effect of current text classification. And then,the original features of text classification are extracted,and the kernel principal component analysis(KPCA)algorithm is introduced to process the original features,reduce the feature dimension and simplify the structure of text categorizer. Finally,the text categorizer is constructed with big data mining technology and compared with other text classifiers.The results of contrastive test show that the proposed method can better describe the characteristics of text change,and accurately recognize and classify various types of texts. The accuracy of text classification of the proposed method is above 95%,which is significantly higher than other current text classification methods. Moreover,the classification duration is significantly reduced and the classification effect is better.
作者 孟鑫淼 MENG Xinmiao(H3C Research Institute of Big Data,Zhengzhou 450001,China)
出处 《现代电子技术》 北大核心 2020年第17期126-129,共4页 Modern Electronics Technique
关键词 大规模文本数据 高维特征 大数据挖掘技术 文本分类器 分类精度 分类时间 large-scale text data high-dimensional feature big data mining technology text classifier classification accuracy classification duration
作者简介 孟鑫淼(1989-),男,河南郑州人,硕士,讲师,主要从事大数据技术方向研究。
  • 相关文献

参考文献16

二级参考文献105

共引文献182

同被引文献107

引证文献6

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部