期刊文献+

基于符号语义的不完整数据聚集查询处理算法 被引量:8

Aggregate Query Processing Algorithm on Incomplete Data Based on Denotational Semantics
在线阅读 下载PDF
导出
摘要 研究了基于符号语义的不完整数据聚集查询处理问题.不完整数据又称为缺失数据,缺失值包括可填充的和不可填充的两种类型.现有的缺失值填充算法不能保证填充后查询结果的准确度,为此,给出了不完整数据聚集查询结果的区间估计.在符号语义中扩展了传统关系数据库模型,提出了一种通用不完整数据库模型.该模型可以处理可填充的和不可填充的两种类型缺失值.在该模型下,提出一种新的不完整数据聚集查询结果语义:可靠结果.可靠结果是真实查询结果的区间估计,可以保证真实查询结果有很大概率在该估计区间范围内.给出了线性时间求解SUM、COUNT和AVG查询可靠结果的方法.真实数据集和合成数据集上的扩展实验验证了所提方法的有效性. This work studies the problem of aggregate query processing over incomplete data based on denotational semantics. Incomplete data is also known as missing values and can be classified into two categories: applicable nulls and inapplicable nulls. Existing imputation algorithms cannot guarantee the accuracy of the query result after imputation. The interval estimation of the aggregate query result is given. This study extends the relational model under the denotational semantic, which can cover all types of incomplete data. A new semantic of aggregate query answers over incomplete data is defined. Reliable answers are interval estimations of the ground-truth query results, which can cover the ground-truth results with high probability. For SUM, COUNT, and AVG queries, linear approximate evaluation algorithms are proposed to compute reliable answers. The extended experiments on the real datasets and synthetic datasets verify the effectiveness of the method proposed in this study.
作者 张安珍 李建中 高宏 ZHANG An-Zhen;LI Jian-Zhong;GAO Hong(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China;School of Computer Science,Shenyang Aerospace University,Shenyang 110136,China)
出处 《软件学报》 EI CSCD 北大核心 2020年第2期406-420,共15页 Journal of Software
基金 国家自然科学基金(61702344).
关键词 不完整数据 近似查询处理 数据修复 结果估计 数据可用性 incomplete data approximate query processing data reparation result estimation data usability
作者简介 通讯作者:张安珍,E-mail:azzhang@hit.edu.cn,(1990-),女,山东临沂人,博士,讲师,主要研究领域为数据质量,弱可用数据计算,查询处理;高宏(1966-),女,博士,教授,博士生导师,CCF杰出会员,主要研究领域为数据库管理,无线传感网,图计算;李建中(1950-),男,博士,教授,博士生导师,CCF会士,主要研究领域为数据库技术,并行计算,传感网
  • 相关文献

参考文献2

二级参考文献165

  • 1刘伟,孟小峰,孟卫一.Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489. 被引量:137
  • 2Redman T. The impact of poor data quality on the typical enterprise [J]. Communications of the ACM, 1998, 41(2) : 79-82.
  • 3Miller D W, Yeast J D, Evans R L. Missing prenatal records at a birth center: A communication problem quantified [C] // Proc of AMIA Annual Syrup Proceedings. Maryland: American Medical Informatics Association, 2005 : 535-539.
  • 4Swartz N. Gartner warns firms of 'dirty data' [J]. Information Management Journal, 2007, 41(3): 6.
  • 5Kohn L T, Corrigan J M, Donaldson M S. To Err is Human: Building a Safer Health System [M]. Washington: National Academies Press, 2000.
  • 6Eckerson W. Data Warehousing Special Report Data quality and the bottom line [R]. Applications Development Trends, 2002.
  • 7English L P. Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits [M]. New York: Wiley, 1999.
  • 8Woolsey B, Schulz M. Credit card statistics, industry facts, debt statistics [OL]. [2013-04-20 ]. http://www. creditcards, com/credit-card-news/credit-card-indust ry-facts- personal-debt-statistics-1276, php.
  • 9Shilakes C, Tylman J. Enterprise information portals [R]. New York: Merrill Lynch, 1998.
  • 10Rahm E, Do H H. Data cleaning:Problems and current approaches [J]. IEEE Data Engineering Bulletin, 2000, 23 (4): 3-13.

共引文献340

同被引文献100

引证文献8

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部