期刊文献+

快速挖掘数据流中离群点 被引量:5

Fast Mining Outliers in Online Data Streams
在线阅读 下载PDF
导出
摘要 离群点检测是数据挖掘领域的一个重要分支,当前数据流的离群点检测研究越来越受到关注.为了快速准确地检测出数据流中离群点,提出一种在线数据流离群点检测算法ODDS(outlier detection in online data stream s).它利用数据与频繁模式的相异程度来度量数据的离群程度,通过构建ODDS-Tree树,能动态地更新数据流中候选离群点的离群信息.实验结果验证了该算法与其他同类算法相比具有较高的效率与优良的可扩展性能. Outlier detection is a significant branch in the field of data mining.Recently research on outlier detection in data stream is receiving more and more attention.In order to detect outliers in data stream fast and accurately,this paper proposes a novel method called ODDS(outlier detection in online data streams).This method measures the outlying degree of data elements by defference between frequent patterns and data elemnts.The method can dynamically update the information about candidate outliers using ODDS-Tree.The experimental results show that ODDS is efficient and scalable.
出处 《小型微型计算机系统》 CSCD 北大核心 2011年第1期9-16,共8页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(60873030)资助 国家"八六三"高技术研究发展计划项目(2007AA01Z309)资助 国家国防预研基金项目(9140A04010209JW05049140A15040208JW0501)资助
关键词 数据流 离群点检测 频繁模式 离群因子 data streams outlier detection frequent pattern outlier factor
作者简介 E-mail:txhwuhan@163.com唐向红.男,1979年生,男,博士研究生,研究方向为实时数据库系统、数据挖掘; 李国徽,男,1973年生,教授,博士生导师,研究方向为现代数据库工程、实时数据库系统; 杨观赐,男,1983年生,博士研究生,主要研究方向为智能系统,数据挖掘.
  • 相关文献

参考文献23

  • 1Hart J, Pei J, Yin Y. Mining frequent patterns without candidate generation[C]. Proceedings of the 2000 ACM SIGMOD, 2000, 1-12.
  • 2Breunig M M, Kriegel H P, Ng R T, et al. OPTICS-OF : identifying local outliers[C]. Proceedings of the 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases, 1999, 262-270.
  • 3Hawkins D. Identification of outlicrs [ M ]. London : Chapman & Hall, 1980.
  • 4Babcock B, Babu S, Datar M, et al. Models and issues in data streams[ C]. Proceedings of ACM Symp on Principles of Database Systems, 2002,1-16.
  • 5Zhang T, Ramakrishnan R, Linvy M. BIRCH : an efficient data clustering method for very large databases[ C]. Proceedings of the ACM Sigmod International Conference on Management of Data, 1996. 103-114.
  • 6Ando S. Clustering needles in a haystack: an information theoretic analysis of minority and outlier detection [ C ]. Proceedings of 7th International Conference on Data Mining, 2007,13-22.
  • 7Breunig M, Kriegel H P,Ng R,et al . LOF: Identifying densitybased local outliers [ C ]. Proceedings of ACM Sigmod Conference, 2000, 93-104.
  • 8Knott E, Ng R. A unified approach for mining outliers: properties and computation [ C ]. Proceedings of Knowledge Discovery and Data Mining ( KDD'97), 1997, 219-222.
  • 9Barnett V, Lewis T. Outliers in statistical data (2nd) [ M]. 2nd New York: Wiley, 1994.
  • 10Ng R T ,Han J. Efficient and effective clustering methods for spatial data mining[ C]. Proceedings of the 20th VLDB Conference, 1994,144-155.

二级参考文献16

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2Gaber MM, Zaslavsky A, Krishnaswamy S. Mining data streams: A review. ACM SIGMOD Record, 2005,34(2): 18-26.
  • 3Jiang N, Gruenwald L. Research issues in data stream association rule mining. ACM SIGMOD Record, 2006,35(1):14-19.
  • 4Garofalakis MN, Gehrke J. Querying and mining data streams: You only get one look a tutorial. In: Franklin MJ, Moon B, Ailamaki A, eds. Proc. of the 2002 ACM SIGMOD Int'l Conf. on Management of Data. Madison: ACM Press, 2002. 635-635.
  • 5Giannella C, Han J, Pei J, Yan X, Yu PS. Mining frequent patterns in data streams at multiple time granularities. In: Data Mining: Next Generation Challenges and Future Directions. 2004. 191-212.
  • 6Chang JH, Lee WS. Finding recent frequent itemsets adaptively over online data streams. In: Lise G, Ted ES, Pedro D, Christos F, eds. Proc. of the 9th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Washington: ACM Press, 2003. 487-492.
  • 7Jiang N, Gruenwald L. CFI-Stream: Mining closed frequent itemsets in data streams. In: Roberto B, Kristin PB, Gautam D, Dimitrios G, Johannes G, eds. Proc. of the 12th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Philadelphia: ACM Press, 2006. 592-597.
  • 8Yu JX, Chong Z, Lu H, Zhang Z, Zhou A. A false negative approach to mining frequent itemsets from high speed transactional data streams, Information Sciences, 2006,176(4):1986-2015.
  • 9Leung CKS, Khan QI. DStree: A tree structure for the mining of frequent sets from data streams. In: Clifton CW, Zhong N, Liu JM, Wah BW, Wu XD, eds. Proc. of the 6th Int'l Conf. on Data Mining. Hong Kong: IEEE Press, 2006. 928-932.
  • 10Wong RCW, Fu AWC. Mining top-k frequent itemsets from data streams. Data Mining and Knowledge Discovery, 2006,13(2): 193-217.

共引文献65

同被引文献56

引证文献5

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部