期刊文献+

Hadoop对小文件的处理性能的研究

Research on Hadoop performance in handing small files
在线阅读 下载PDF
导出
摘要 Hadoop是Apache基金会所开发的支持涉及数千个节点和海量数据的分布式计算的高级项目。它是一个开源免费的软件框架,受到谷歌的MapReduce和谷歌文件系统(GFS)的启发,由Java语言实现,全球的开发志愿者共同对它进行完善。Hadoop的子项目包括HDFS,MapReduce,HBase,Hive等。HDFS是一个分布式文件系统,提供高吞吐量的应用数据接口使得Hadoop具有很好的性能。MapReduce是一个软件框架,它执行涉及海量集群数据的分布式计算的MapReduce算法。尽管Hadoop被广泛使用,但它仍然存在一些影响性能的缺陷,对于小文件的处理能力就是其中缺陷之一。档案文件(Hadoop Archives)和序列文件(sequence files)是两种现有的改善小文件处理问题的解决方案,但它们仍然有各自的不足,提出一个解决方案,保留它们的优点使Hadoop在处理小文件上拥有更好的性能。 Hadoop is Apache senior project that supports distributed applications which involves huge amount of data and thousands of nodes. It is a software framework,free and open source,Which is inspired by Google ’s MapReduce and Google file system. It is developed by global volun-teers,implemented in Java. It’s subprojects including HDFS,MapReduce,HBase,Hive and so on. HDFS is a distributed file system which provides the high performance of Hadoop by giving out high throughput access to data. MapReduce is a software framework,performs distributed computation involving huge amount of data on clusters. Although Hadoop is widely used,it has some defects which affect its performance,one of them is the small files problem. Hadoop Archives and sequence files are two existing solutions. But they still have their shortcomings. This paper propose a solution which is expected to derive their merits and make the Hadoop has better performance in handing small files.
作者 艾明
出处 《信息技术》 2015年第10期142-144,148,共4页 Information Technology
关键词 HADOOP MAPREDUCE HDFS(Hadoop Distributed FILE System) HADOOP ARCHIVES sequence FILES Hadoop MapReduce HDFS(Hadoop Distributed File System) Hadoop Archives sequence files
作者简介 艾明(1989-),男,硕士研究生,研究方向为云计算。
  • 相关文献

参考文献4

  • 1Tom Wbite. Hadoop 权威指南[M] .曾大冉,周傲英,译.北京:清华大学出版社,2010.
  • 2周可,王桦,李春花.云存储技术及其应用[J].中兴通讯技术,2010,16(4):24-27. 被引量:130
  • 3Hadoop Archives: Archives Guide [ EB/OL] ( 2010 ). http: // ha- doop. apache, org/core/docs/10. 20.0/hadoop_archives. html.
  • 4Hadoop Distributed File System: HDFS Architecture [ EB/OL]. ( 2010 ). http :// hadoop, apache, org/common/docs/rO. 20. 1/hdfs- design, html.

二级参考文献10

  • 1田敬,代亚非.P2P持久存储研究[J].软件学报,2007,18(6):1379-1399. 被引量:52
  • 2Hayes B.Cloud Computing[J].Communications of the ACM,2008,51(7):9-11.
  • 3LIN G,DASMALCHI G,ZHU J.Cloud Computing and IT as a Service:Opportunities and Challenges[C] //Proceedings of the IEEE 6th International Conference on Web Services(ICWS'08),Sep 23-26.2008,Beijing.China.Los Alamitos,CA,USA:IEEE Computer Society.2008:5.
  • 4NAMJOSHI J,GUPTE A.Service Odented Architecture for Cloud Based Travel Reservation Software as a Service[C] //Proceedings of the 2009 IEEE International Conference on Cloud Computing(CLOUD'09).Sep 21-25,2009,Bangalore,India Los Alamitos,CA,USA:IEEE Computer Society,2009.147-150.
  • 5LAPLANTE P A,ZHANG J,VOAS J What's in a Name?Distinguishing Between SaaS and SOA[J].IT Professional,2008,10(3):46-50.
  • 6CAMPBELL-KELL Y M.Historical Reflections on the Rise,Fall.and ResurrectiOn Of Software as a Service[J].Communications of the ACM.2009.52(5):28-30.
  • 7王庆波,金涬,何乐.等虚拟化与云计算[M].北京:电子工业出版社.2009.
  • 8ZHU B.LI K.PATTERSON H Avoiding the Dlsk Bottleneck in the Data Domain Deduplication File System[C] //Proceedings of the 6th USENIX Conference on File and Storage Technologies(FAST'08).Feb 26-29,2008,San Jose.CA,USA Berkeley,CA,USA:USENIX Asaociation.2008 269-282.
  • 9LILLIBRIDGE M,ELNIKETY S,BIRRELL A,et al.A Cooperative Internet Backup Scheme[C] //Proceedings of the 2003 USENIX Annual Technical Conference(USENIX'03).Jun 12-14.2003.San Antonio,TX,USA Berkeley,CA,USA:USENIX Association.2003:29-41.
  • 10PAMIES-JUAREZ L,GARCIA-LOPEZ P,SANCHEZ-ARTIGAS M Rewarding Stability in Peer-to-Peer Backup Systems[C] //Proceedings of 16th IEEE International Conference on Networks(ICON'08),Dec 12-14,2008.New Delhi,India.Piacataway,NJ.USA:IEEE,2008:6p.

共引文献130

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部