摘要
由于客观世界的复杂性,信息缺失、不确定信息是普遍存在的。数据库作为表达现实世界的一种工具,使用空值来表达信息缺失的问题。针对关系数据库中的空值问题,提出一种基于模糊聚类和线性回归的空值估计方法。该方法首先对数据表中的数据进行挖掘,找出与被估计属性相关联的属性集。该过程仅利用数据本身提供的信息,避免了由专家决定条件属性时由于主观性造成的误差。其次根据所得属性集进行模糊聚类得到对原始数据的一个划分,再基于所得分簇和线性回归给出一个估计关系表中空值的方法。最后利用平均绝对错误率来衡量算法估值的准确率。实验结果表明该方法估值的结果与其他方法相比具有较高的准确率。
Missing information, indefinite information as well as ambiguous information truly exists due to the complexity of the real world. Relational database, as an important tool to express the real world, use null value to express the missing of information. Focusing on estimation of null values in relational databases, the paper proposes a new method to estimate null values based on fuzzy clustering and multiple regressions. It starts with data mining of databases, finds out the at- tribute set connected with estimated attributes. The information provided by data exclusively without any other prior knowl- edge leads to relatively objective condition attributes, thus avoiding certain errors resulted from subjectivity when it is up to professors to determine condition attributes. Then we obtained a partition of original data based on the attribute set. And the clustering and multiple regressions we come up with enable us to find a method to estimate null values in databases. Finally, mean of absolute error rate is adopted to measure the estimation accuracy. The experiments results show that the proposed method has relatively high accuracy.
出处
《计算技术与自动化》
2016年第3期110-114,共5页
Computing Technology and Automation
基金
南京航空航天大学研究生创新基地(实验室)开放基金资助项目(kfjj201460)
关键词
关系数据库
空值
模糊聚类
多元线性回归
relational database
null value
fuzzy clustering
multiple linear regression
作者简介
刘力(1992-),男,安徽池州人,硕士,研究方向:数据管理与知识工程。通讯联系人,E-mail:18856671287@163.com