期刊文献+

基于Hadoop的小文件存储优化方案 被引量:12

Storage Optimization Method of Small Files Based on Hadoop
在线阅读 下载PDF
导出
摘要 Hadoop作为成熟的分布式云平台,对较大的文件提供了可靠高效的存储服务,但在处理海量小文件时效率显著降低。该文提出了基于Hadoop的海量教育资源小文件的存储优化方案,利用教育资源小文件间的关联关系,将小文件进行合并成大文件以减少文件数量,并索引机制访问小文件、元数据缓存和关联小文件预取机制来提高文件的读取效率。实验结果表明,该方法提高了Hadoop文件系统存储小文件的存取效率。 Hadoop distributes file system (HDFS) can process large amounts of data effectively through large clusters. However, HDFS is designed to handle large files and suffers performance penalty while dealing with large number of small files. An approach based on HDFS is proposed to improve storage efficiency of small files in HDFS. The main idea is to classify the mass small files, merge them by classes, and index the merged files aiming at reducing the amount of index items in namenodes and improving the storage efficiency. Experimental results show that the storage efficiency of small files is improved contrasting to Hadoop Archives (HAR files).
出处 《电子科技大学学报》 EI CAS CSCD 北大核心 2016年第1期141-145,共5页 Journal of University of Electronic Science and Technology of China
基金 教育部-中国移动科研基金(MCM20121041) 国家自然科学基金(61133016 61103206) 国家863计划(2011AA010706)
关键词 HADOOP 索引机制 关联关系 小文件存储 Hadoop index mechanism relationship storage of small files
作者简介 李孟(1981-),女,博士生,主要从事计算机网络和知识工程方面的研究.
  • 相关文献

参考文献1

二级参考文献21

  • 1Ghemawat S, Gobioff H, Leung S. The Google file system [ C ]//19th Symposium on Operating Systems Principles, NY : IEEE, 2003 : 29 - 43.
  • 2The apache hadoop project, hadoop distributed file system [EB/OL]. (2012- 12-05) [2012 - 12 -20]. http:/! hadoop, apache, org/.
  • 3Srirams. Kosmos file system [ EB/OL ]. (2011 - 05 ) [ 2011 - 11 - 15 ]. http ://code. google, com/p/kosmosfs.
  • 4Moose file system[ EB/OL]. (2012 - 08 - 16 ) [ 2012 - 09 -15]. http://www, moosefs, org.
  • 5Beaver D, et al. Finding a needle in Haystack: Facebook "s photo storage [ C ] // 9th USENIX Symposium on Operating Systems Design and Implementation, October 4 - 6 Canada 2010.
  • 6Taobao file system[ CP/OL]. (2012 - 12 -04) [ 2012 - 12 - 21 ]. http ://code. taobao, org/p/tfs/src/.
  • 7Cloudera big data solution [ EB/OL ]. [ 2012 - 12 - 22 ]. http ://www. cloudera, com.
  • 8MapR big data solution[ EB/OL]. [ 2012 - 12 - 22 ]. http :// www. mapr. com.
  • 9McKusick M K, Quinlan S. GFS: Evolution on fast-forward [R/OL]. (2009 - 08 - 07) [2011 - 10 - 09]. http:/! queue, acm. org/detail, cfm? id = 1594206.
  • 10White T. The small files problem [ R/OL]. (2009 -02 -02) [ 2011 - 08 - 23 ]. http ://www. cloudera, com/blog/2009/ 02/the-small-files-problem/.

共引文献8

同被引文献92

引证文献12

二级引证文献59

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部