摘要
分析传统串行关联规则Apriori算法的计算过程以及存在的一些缺点,针对串行算法执行效率低,时间复杂度高以及传统并行计算模式不能处理节点失效,难以处理负载均衡等问题,提出基于Hadoop平台实现并行关联规则算法的设计方法,对传统关联规则Apriori算法进行了改进,并给出改进算法在Hadoop平台的MapReduce编程模型上的执行流程;在Hadoop平台上对改进后的算法进行单机测试和集群测试,实验结果证明,改进后的算法具有较高的执行效率,良好的加速比和可移植性。
The traditional association rule Apriori algorithm and its defect are analyzed,on account of the serial algorithm are lower efficiency,high time complexity and the traditional parallel computing can not deal with node failure,it is also difficult to deal with issues such as load balancing,the parallel association rule algorithm based on the Hadoop platform is proposed,the traditional association rule Apriori algorithm has been improved and the implementation process of the improved algorithm based on the MapReduce programming model is given;the improved algorithm is tested on a single computer and clusters,experimental results show that the improved algorithm has a higher efficiency,better speedup and portability.
出处
《计算机与现代化》
2013年第3期1-4,8,共5页
Computer and Modernization
基金
国家自然科学基金资助项目(61163025)
内蒙古自然科学基金资助项目(2012MS0912)
教育部春晖计划项目(Z2009-1-01044)
作者简介
郝晓飞(1988-),男,河南禹州人,内蒙古科技大学信息工程学院硕士研究生,研究方向:云计算;
谭跃生(1959-),男,湖南衡阳人,内蒙古科技大学网络中心教授,学士,研究方向:云计算与网格计算;
王静宇(1976-),男,河南开封人,博士研究生,副教授,研究方向:云计算与网格计算。