Hadoop对小文件的处理性能的研究

Research on Hadoop performance in handing small files

在线阅读下载PDF

导出

摘要 Hadoop是Apache基金会所开发的支持涉及数千个节点和海量数据的分布式计算的高级项目。它是一个开源免费的软件框架,受到谷歌的MapReduce和谷歌文件系统(GFS)的启发,由Java语言实现,全球的开发志愿者共同对它进行完善。Hadoop的子项目包括HDFS,MapReduce,HBase,Hive等。HDFS是一个分布式文件系统,提供高吞吐量的应用数据接口使得Hadoop具有很好的性能。MapReduce是一个软件框架,它执行涉及海量集群数据的分布式计算的MapReduce算法。尽管Hadoop被广泛使用,但它仍然存在一些影响性能的缺陷,对于小文件的处理能力就是其中缺陷之一。档案文件(Hadoop Archives)和序列文件(sequence files)是两种现有的改善小文件处理问题的解决方案,但它们仍然有各自的不足,提出一个解决方案,保留它们的优点使Hadoop在处理小文件上拥有更好的性能。 Hadoop is Apache senior project that supports distributed applications which involves huge amount of data and thousands of nodes. It is a software framework,free and open source,Which is inspired by Google ’s MapReduce and Google file system. It is developed by global volun-teers,implemented in Java. It’s subprojects including HDFS,MapReduce,HBase,Hive and so on. HDFS is a distributed file system which provides the high performance of Hadoop by giving out high throughput access to data. MapReduce is a software framework,performs distributed computation involving huge amount of data on clusters. Although Hadoop is widely used,it has some defects which affect its performance,one of them is the small files problem. Hadoop Archives and sequence files are two existing solutions. But they still have their shortcomings. This paper propose a solution which is expected to derive their merits and make the Hadoop has better performance in handing small files.

作者艾明

机构地区武汉邮电科学研究院

出处《信息技术》 2015年第10期142-144,148,共4页 Information Technology

关键词 HADOOP MAPREDUCE HDFS(Hadoop Distributed FILE System) HADOOP ARCHIVES sequence FILES Hadoop MapReduce HDFS（Hadoop Distributed File System） Hadoop Archives sequence files

分类号 TP391 [自动化与计算机技术—计算机应用技术]

作者简介艾明（1989-），男，硕士研究生，研究方向为云计算。

引文网络
相关文献

参考文献4

1Tom Wbite. Hadoop 权威指南[M] .曾大冉,周傲英,译.北京:清华大学出版社,2010.
2周可,王桦,李春花.云存储技术及其应用[J].中兴通讯技术,2010,16(4):24-27. 被引量：130
3Hadoop Archives: Archives Guide [ EB/OL] ( 2010 ). http: // ha- doop. apache, org/core/docs/10. 20.0/hadoop_archives. html.
4Hadoop Distributed File System: HDFS Architecture [ EB/OL]. ( 2010 ). http :// hadoop, apache, org/common/docs/rO. 20. 1/hdfs- design, html.

二级参考文献10

1田敬,代亚非.P2P持久存储研究[J].软件学报,2007,18(6):1379-1399. 被引量：52
2Hayes B.Cloud Computing[J].Communications of the ACM,2008,51(7):9-11.
3LIN G,DASMALCHI G,ZHU J.Cloud Computing and IT as a Service:Opportunities and Challenges[C] //Proceedings of the IEEE 6th International Conference on Web Services(ICWS'08),Sep 23-26.2008,Beijing.China.Los Alamitos,CA,USA:IEEE Computer Society.2008:5.
4NAMJOSHI J,GUPTE A.Service Odented Architecture for Cloud Based Travel Reservation Software as a Service[C] //Proceedings of the 2009 IEEE International Conference on Cloud Computing(CLOUD'09).Sep 21-25,2009,Bangalore,India Los Alamitos,CA,USA:IEEE Computer Society,2009.147-150.
5LAPLANTE P A,ZHANG J,VOAS J What's in a Name?Distinguishing Between SaaS and SOA[J].IT Professional,2008,10(3):46-50.
6CAMPBELL-KELL Y M.Historical Reflections on the Rise,Fall.and ResurrectiOn Of Software as a Service[J].Communications of the ACM.2009.52(5):28-30.
7王庆波,金涬,何乐.等虚拟化与云计算[M].北京:电子工业出版社.2009.
8ZHU B.LI K.PATTERSON H Avoiding the Dlsk Bottleneck in the Data Domain Deduplication File System[C] //Proceedings of the 6th USENIX Conference on File and Storage Technologies(FAST'08).Feb 26-29,2008,San Jose.CA,USA Berkeley,CA,USA:USENIX Asaociation.2008 269-282.
9LILLIBRIDGE M,ELNIKETY S,BIRRELL A,et al.A Cooperative Internet Backup Scheme[C] //Proceedings of the 2003 USENIX Annual Technical Conference(USENIX'03).Jun 12-14.2003.San Antonio,TX,USA Berkeley,CA,USA:USENIX Association.2003:29-41.
10PAMIES-JUAREZ L,GARCIA-LOPEZ P,SANCHEZ-ARTIGAS M Rewarding Stability in Peer-to-Peer Backup Systems[C] //Proceedings of 16th IEEE International Conference on Networks(ICON'08),Dec 12-14,2008.New Delhi,India.Piacataway,NJ.USA:IEEE,2008:6p.

共引文献130

1叶思斯,林志达,郭献彬,曹小明.基于MongoDB的配置管理平台应用研究[J].系统仿真技术,2021,17(4):253-258. 被引量：5
2王光肇.云计算中的关键性数据库探索[J].计算机产品与流通,2020,0(8):56-56.
3陈晓丹,庞双龙,曾德生,尹玉婷.Ceph存储系统在云计算环境中的应用[J].电子技术（上海）,2020(8):40-42. 被引量：5
4顾静.个人数字档案馆创建构想[J].档案,2023(2):62-66.
5宋爱华.云存储对个人用户的影响[J].开封教育学院学报,2014,34(2):287-288.
6邹群.一种基于Hadoop的数字图书存储系统设计方案[J].黑龙江史志,2014(1). 被引量：1
7张莉.浅析私有存储云在企业中的优势与安全性[J].现代计算机（中旬刊）,2011(7):63-65. 被引量：2
8武文斌.智能视频分析的现状与未来发展趋势[J].科技情报开发与经济,2011,21(31):168-171. 被引量：10
9何思敏,康慕宁,张晓,孙展望.云存储性能评测技术与方法研究[J].计算机与现代化,2011(12):1-4. 被引量：9
10孙福权,张达伟,程勖,刘超.基于Hadoop企业私有云存储平台的构建[J].辽宁工程技术大学学报（自然科学版）,2011,30(6):913-916. 被引量：33

1The BriU Journal Archives Online[J].Frontiers of Literary Studies in China-Selected Publications from Chinese Universities,2013,7(1):159-159.
2The BriU Journal Archives Online[J].Frontiers of Philosophy in China,2013,8(1).
3The Brill Journal Archives Online （2012）[J].Frontiers of History in China,2012,7(3).
4The Brill Journal Archives Online[J].Frontiers of Literary Studies in China-Selected Publications from Chinese Universities,2015,9(3).
5The Brill Journal Archives Online[J].Frontiers of Business Research in China,2013,7(1):165-165.
6The Brill Journal Archives Online （2012）[J].Frontiers of Business Research in China,2012,6(4).
7The Brill Journal Archives Online （2012）[J].Frontiers of Business Research in China,2012,6(2).
8The Brill lournal Archives Online[J].Frontiers of History in China,2013,8(4).
9The Brill Journal Archives Online[J].Frontiers of Economics in China-Selected Publications from Chinese Universities,2013,8(1):164-164.
10The Brill Journal Archives Online （2012）[J].Frontiers of History in China,2012,7(4).

信息技术

2015年第10期

浏览历史

内容加载中请稍等...

Hadoop对小文件的处理性能的研究

参考文献4

二级参考文献10

共引文献130

相关作者

相关机构

相关主题

浏览历史