摘要
MapReduce作业执行过程包含Map和Reduce两个阶段,Reduce阶段需要复制Map阶段产生的中间数据到本地进行计算产生最终的输出数据。其中,Reduce阶段包括Sort,Shuffle和Reduce等3个子阶段,Shuffle子阶段通过网络链路传输数据,花费的时间占Reduce阶段的1/3以上,具有较大的优化空间。文章提出了一种基于Reduce阶段执行链路分析的优化节点选择算法,通过合理选择优化节点,并部署相对应的Reduce任务,降低节点间的数据传输开销,减少对网络带宽资源的占用,加速Reduce任务的执行,从而实现总体MapReduce作业的执行优化。
There are two stages for MapReduce task execution:Map and Reduce,where Reduce task need to copy the intermedia data produced by the Map tasks,and output the final result.For the Reduce stage,it contains3sub-stages:Sort,Shuffle and Reduce,where Shuffle transfer the data via network links and consumes most of the time of Reduce stage.In this paper,we propose a link analysis-based node selection algorithm to place the Reduce tasks,such that it can reduce the cost of data transmission and reduce the network occupation.This algorithm guarantees to reduce the extension time of MapReduce job.
作者
冒佳明
王鹏飞
赵然
Mao Jiaming;Wang Pengfei;Zhao Ran(Information and Communication Branch,State Grid Jiangsu Electric Power Company,Nanjing 210000,China)
出处
《无线互联科技》
2018年第22期5-6,共2页
Wireless Internet Technology
作者简介
冒佳明(1991-),男,江苏泰州人,工程师,硕士;研究方向:信息系统,网络安全,自动化运维。