摘要
Hadoop作为成熟的分布式云平台,对较大的文件提供了可靠高效的存储服务,但在处理海量小文件时效率显著降低。该文提出了基于Hadoop的海量教育资源小文件的存储优化方案,利用教育资源小文件间的关联关系,将小文件进行合并成大文件以减少文件数量,并索引机制访问小文件、元数据缓存和关联小文件预取机制来提高文件的读取效率。实验结果表明,该方法提高了Hadoop文件系统存储小文件的存取效率。
Hadoop distributes file system (HDFS) can process large amounts of data effectively through large clusters. However, HDFS is designed to handle large files and suffers performance penalty while dealing with large number of small files. An approach based on HDFS is proposed to improve storage efficiency of small files in HDFS. The main idea is to classify the mass small files, merge them by classes, and index the merged files aiming at reducing the amount of index items in namenodes and improving the storage efficiency. Experimental results show that the storage efficiency of small files is improved contrasting to Hadoop Archives (HAR files).
出处
《电子科技大学学报》
EI
CAS
CSCD
北大核心
2016年第1期141-145,共5页
Journal of University of Electronic Science and Technology of China
基金
教育部-中国移动科研基金(MCM20121041)
国家自然科学基金(61133016
61103206)
国家863计划(2011AA010706)
作者简介
李孟(1981-),女,博士生,主要从事计算机网络和知识工程方面的研究.