摘要
目前数据量越来越大,存储空间不足,但数据源中带有重复性质的数据比例过高,导致数据的冗余度偏高。为解决这一问题,提出一种基于数据源中数据密度分布不同的密度区域划分算法。将数据源中高密度数据区域进行筛选提取,对此区域中的具有高度重复性质的数据进行降低冗余度的擦除动作,达到降低数量级压缩数据源的目的。实验结果表明,相比传统的数据压缩LZW算法,该算法的数据压缩策略在压缩率与数据适用性上更具优势与灵活性。
At present,the amount of data is more and more large,and the storage space is insufficient,but the proportion of data with repetitive nature in the data source is too high,resulting in the high degree of data redundancy.To solve this problem,a density region division algorithm based on the different density distribution of data in the data source was proposed,which filtered and extracted the high-density data region in the data source,and erased the highly repetitive data in the region to reduce the redundancy,so as to achieve the purpose of reducing the number of compressed data sources.The basic comparison experiment shows that the traditional LZW algorithm has more advantages and flexibility in compression rate and data applicability.
作者
赵会群
李春良
ZHAO Hui-qun;LI Chun-liang(School of Computer Science,North China University of Technology,Beijing 100144,China)
出处
《计算机工程与设计》
北大核心
2020年第9期2482-2487,共6页
Computer Engineering and Design
基金
国家自然科学基金项目(61672041)。
作者简介
赵会群(1960-),男,辽宁沈阳人,博士,教授,研究方向为软件体系结构、物联网;通讯作者:李春良(1993-),男,安徽淮南人,硕士,研究方向为大数据压缩存储。E-mail:1966788058@qq.com。