期刊文献+

基于MapReduce框架一种文本挖掘算法的设计与实现 被引量:4

The Design and Implemention of a Text Mining Algorithm Based on MapReduce Framework
在线阅读 下载PDF
导出
摘要 随着文本挖掘在主动信息服务中应用的日益扩展,在文本数据的基础上分析数据的内在特征已经成为目前的研究趋势,本文在Hadoop平台上设计并实现了一种文本挖掘算法,该算法利用Ma-pReduce框架按照自然语料中相邻词组出现的频数进行降序输出,从而有助于用户挖掘大量数据中各项集之间的联系,实验结果体现了该算法的有效性和良好的加速比. With the expanding application of text mining in active information service, analyzing the inherent characteristics of data based on the text data is becoming a current research trend,this paper designs and im- plements a text mining algorithm based on the Hadoop platform which outputs the data according to the natural corpora adjacent phrase descending frequency, thus helping the users mine the link between the set in the large quantities of data, In view of the distributed feature of the Hadoop platform, the experimental result shows the efficiency and better speedup.
出处 《郑州大学学报(工学版)》 CAS 北大核心 2012年第5期110-113,共4页 Journal of Zhengzhou University(Engineering Science)
基金 国家自然科学基金资助项目(60970060) 天津市教委资助项目(20071328) 天津市科技支撑计划重点项目(09ZCKFGX00500) 天津师大博士基金项目(52LX17)
关键词 HADOOP MAPREDUCE 相邻词组 降序输出 Hadoop MapReduce adjacent phrase descending output
作者简介 张桂芸(1965-),女,天津蓟县人,天津师范大学教授,博士后,硕士生导师,主要从事人工智能和数据挖掘研究,E-mail:dyxyl999@126.com
  • 相关文献

参考文献7

二级参考文献26

共引文献78

同被引文献25

  • 1张兵.一种网络日志挖掘的高效算法[J].广西师范大学学报(自然科学版),2006,24(1):26-29. 被引量:2
  • 2Cloudera, Inc. Flume User Guide [ EB/OL]. 2012 - 08. http://archive, cloudera, com/cdh/3/flume/UserGuide/.
  • 3DEAN J, GHEMAWAT S. MapReduce : simplified data processing on large clusters[J]. Communications of the ACM, 2012, 51 (1) : 107-113.
  • 4ELSAYED T, LIN J, OARD D W. Pairwise document similarity in large collections with MapReduce [ C ]//Proc of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies. [ S. 1. ] : Association for Computational Lin- guistics, 2008: 265-268.
  • 5KANG U, TSOURAKAKIS C E, FALOUTSOS C. PEGASUS: a pe- ta-seale graph mining system implementation and observations [ C ]// Proc of the 9th IEEE International Conference on Data Mining. Wash- ington DC : IEEE Computer Society, 2009 : 229-238.
  • 6EKANAYAKE J, PALLICKARA S, FOX G. Mapreduce for data in- tensive scientific analyses[ C]//Proc of the 4th IEEE International Conference on eScience. 2008: 277-284.
  • 7LIN J, BAHETY A, KONDA S, et al. Low-latency, high-throughput access to static global resources within the Hadoop framework, HCIL- 2009-01 [ R ]. Maryland : University of Maryland, 2013 : 1211-1228.
  • 8BRANTS T, POPAT A C, XU Peng, et al. Large language models in machine translation[ C]//Proc of Joint Conference on Empircal Me- thods in Natural Language Processing. 2007.
  • 9SABATTI C, LANGE K. Genomewide motif identification using a die- tionary model[J]. Proceedings of the IEEE, 2002, 90 ( 11 ) : 1803-1810.
  • 10CROFT W B, METZLER D, STROHMAN T. Search engines: infor- mation retrieval in practice [ M]. Boston: Addison-Wesley, 2010.

引证文献4

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部