摘要
针对传统数据备份算法无法有效剔除备份过程中所产生的冗余数据,导致数据存储空间利用率下降,提出基于平衡二叉树的数控机床数据去重备份算法。通过编辑距离算法计算数控机床数据的属性集,得到数据间的相似度,依靠Canopy算法获取数据的关键属性,并结合数据相似度剔除存在重复记录的数据,最后将需要备份的机床元数据以时间戳作为基础,使用平衡二叉树算法对数据进行备份,并结合数据冗余挖掘模型,剔除在数据备份与恢复过程中所产生的冗余数据,实现数据去重备份。实验证明,所提方法可以有效避免因重复数据传输量提升而引起的带宽瓶颈效应,提升数据存储空间利用率。
In fact, the redundant data generated during the backup process lead to reduced data storage space utilization. Therefore, a data De duplication backup algorithm of NC machine tool based on balanced binary tree was presented in this paper. First of all, the attribute set of NC machine tool data was calculated by editing distance algorithm to obtain the similarity between data. Then, based on the canopy algorithm, the key attributes of the data were also obtained. Secondly, according to the data similarity, the repeatedly recorded data were canceled. Then, the machine tool metadata to be backed up took the timestamp as the basis, used the balanced binary tree algorithm to back up the data, and combined with the data redundancy mining model, eliminated the redundant data generated in the process of data backup and recovery, completing data De duplication backup. The experiments show that this method can not only eliminate the bandwidth bottleneck effect, but also improve the utilization of data storage space.
作者
秦金祥
杨萌
QIN Jin-xiang;YANG Meng(Engineering Training Center of Xi'an Shiyou University,Xi'an Shanxi 710065,China;Information Center of Xi'an Shiyou University,Xi'an Shanxi 710065,China)
出处
《计算机仿真》
北大核心
2023年第1期548-552,共5页
Computer Simulation
关键词
平衡二叉树
数控机床
数据去重
数据备份
数据挖掘
Balanced binary tree
CNC machine tool
Data De duplication
Data backup
Data mining
作者简介
秦金祥(1972-),男(汉族),江苏靖江人,硕士研究生,讲师,主要从事计算机数控仿真教学与研究;杨萌(1977-),女(汉族),河南洛阳人,硕士,工程师,主要从事计算机网络信息系统开发、网络信息安全等方面的研究。