期刊文献+

基于IB方法的无冗余多视角聚类 被引量:6

Non-Redundant Multi-View Clustering Based on Information Bottleneck
在线阅读 下载PDF
导出
摘要 针对数据中多视角模式挖掘的问题,提出一个基于IB方法的无冗余多视角聚类算法:NrMIB.该算法一方面采用IB思想来最大化地保存聚类结果中的信息量,以确保高质量的聚类结果;另一方面通过最小化聚类结果与已知数据划分模式间的互信息来确保新的聚类结果相对于已知划分模式是无冗余的.NrMIB算法既适宜于分析共现数据,又适宜于分析欧氏空间非共现数据,可挖掘出数据中线性及非线性可分模式,无需额外参数来估算欧氏空间的信息量.在人工构造数据模式识别、人脸识别和文档聚类上的实验结果表明,NrMIB算法可有效地挖掘出数据中所蕴含的多个合理划分模式,性能优于传统单视角聚类算法及3个现有的无冗余多视角聚类算法. Typical clustering algorithms output a single partition of the data. However, in real world applications, data can often be interpreted in many different ways and has different reasonable partitions from multiple views. Instead of committing to one clustering solution, here we introduce a novel algorithm, NrMIB (non-redundant multi-view information bottleneck), which can provide several non-redundant clustering solutions from multiple views to the user. Our approach employs the information bottleneck fIB) method, which aims to maximize the relevant information preserved by clustering results, to ensure the qualities of the clustering solutions, whilst the mutual information between the clustering labels and the known data partitions is minimized to ensure that the new clustering solutions are non-redundant. By adopting the mutual information and MeanNN differential entropy to estimate the preserved information, the NrMIB can be used to analyze both co-occurrence data and Euclidean space data. Besides, our algorithm is also suitable to analyze high dimension data, and can discover both linear and non-linear cluster shapes. We perform experiments on synthetic data pattern recognition, face recognition, and document clustering to assess our method against a large range of clustering algorithms in the literature. The experimental results show that the proposed NrMIB algorithm can discover the multiple reasonable partitions resided in the data, and the performance of NrMIB is superior to three non-redundant multi-view clustering algorithms examined here.
出处 《计算机研究与发展》 EI CSCD 北大核心 2013年第9期1865-1875,共11页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61170223 61202207) 国家自然科学基金联合基金项目(U1204610)
关键词 聚类 无冗余多视角 IB方法 互信息 平均微分熵 clustering non-redundant multi-view information bottleneck (IB) method mutualinformation l meanNN differential entropy
  • 相关文献

参考文献31

  • 1JainAK. MurtyMN. Flynn PJ. Data clustering: A review[J]. ACM Computing Surveys (CSUR). 1999. 31(3): 264- 323.
  • 2Jain A K. Data clustering: 50 years beyond k-means[J]. Pattern Recognition Letters. 2010. 31(8): 651-666.
  • 3Caruana R. Elhaway M. Nguyen N. et a1. Meta clustering[CJ IIProc of IEEE 6th Int Conf on Data Mining. Piscataway. NJ: IEEE. 2006: 107-118.
  • 4Cui Y. Fern X Z. DyJ G. Non-redundant multi-view clustering via orthogonalization[CJ IIProc of IEEE 7th Int Conf on Data Mining. Piscataway. NJ: IEEE. 2007: 133- 142.
  • 5Jain P. Meka R. Dhillon I S. Simultaneous unsupervised learning of disparate clusterings[J]. Statistical Analysis and Data Mining. 2008. 1(3): 195-210.
  • 6Niu D. DyJ G.Jordan M I. Multiple non-redundant spectral clustering views[CJ IIProe of the 27th Int Conl on Machine Learning. New York: ACM. 2010: 831-838.
  • 7Gondek D. Hofmann T. Conditional information bottleneck clustering[CJ IIProc of IEEE 3rd Int Conf on Data Mining, Workshop on Clustering Large Data Sets. Piscataway. NJ: IEEE. 2003: 36-42.
  • 8Gondek D. Hofmann T. Non-redundant clustering with conditional ensembles[CJ IIProc of the 17th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM. 2005: 70-77.
  • 9Bae E. BaileyJ. Coala , A novel approach for the extraction of an alternate clustering of high quality and high dissimilarity[CJ //Proc of IEEE 6th Int Conf on Data Mining. Piscataway, NJ: IEEE, 2006: 53-62.
  • 10Gondek D. Hofmann T. Non-redundant data clustering[J]. Knowledge and Information Systems, 2007. 12(1): 1-24.

二级参考文献79

  • 1叶阳东,刘东,贾利民,LI Gang.一种自动确定参数的sIB算法[J].计算机学报,2007,30(6):969-978. 被引量:5
  • 2N Tishby, F Pereira, W Bialek. The information bottleneck method[ A] .Proceedings of 37th Allerton Conference on Communication, Control and Computing[ C]. 1999. 368- 377.
  • 3N Slonim, N Friedman, N Tishby. Unsupervised document classification using sequential information maximization[ A ]. Proceedings of the 25th Ann. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval [ C ]. 2002. 129 - 136.
  • 4N Slonim, N Tishby. Document clustering using word clusters via the information bottleneck method[ A]. Proceedings of 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [ C ]. Athens, Greece, 2000.208 - 215.
  • 5J Goldberger, S Gordon, H Greenspan. Unsupervised image-set clustering using an information theoretic framework[ J]. IEEE Transactions on Image Processing, 2006,5 (2) : 449 - 458.
  • 6M Gorodetsky. Methods for discovering semantic relations between words based on co-occurrence patterns in corpora[ D ]. School of Computer Science and Engineering, Hebrew university, Jerusalem, 2002.
  • 7Winston H Hsu, Lyndon S Kennedy, Shih-Fu Chang. Video search remnking via information bottleneck principle[ A]. Proceedings of ACM International Conference on Multimedia[ C]. Santa Barbara, CA, USA, 2006.35 - 44.
  • 8N Slonim. The information bottleneck: Theory and Application [ D ]. The Hebrew University of Jerusalem, Jerusalem, Israel,2002.
  • 9N Slonim, N Tishby. Agglomerative information bottleneck [ A]. Proceedings of Advances in Neural Information Processing Systems (NIPS-2000) [ C ]. 1999, vol. 12.617 - 623.
  • 10J Peltonen, J Sinkkonen, S Kaski. Sequential information bottleneck for finite data[ A]. Proceedings of 21st International Conference on Machine Learning[ C]. Madison, USA, 2004. 647 - 654.

共引文献32

同被引文献24

  • 1王鹏,杨士强,刘志强.信息论联合聚类算法及其在视频镜头聚类中的应用[J].计算机学报,2005,28(10):1692-1699. 被引量:6
  • 2吕庆文,陈武凡.基于互信息量的图像分割[J].计算机学报,2006,29(2):296-301. 被引量:40
  • 3丁世飞,史忠植,靳奉祥,夏士雄.基于广义信息距离的直接聚类算法[J].计算机研究与发展,2007,44(4):674-679. 被引量:6
  • 4叶阳东,刘东,贾利民,LI Gang.一种自动确定参数的sIB算法[J].计算机学报,2007,30(6):969-978. 被引量:5
  • 5Chandola V, Banerjee A, Kumar V. Anomaly Detection: A Survey[J]. Acre Computing Surveys ,2009,41 (3):75-79.
  • 6Zimek A, Schubert E, Kriegel H P. A Survey on Unsupervised Outlier Detection in High-Dimensional Numerical Data[J]. Statistical Analysis & Data Mining the Asa Data Science Joumal,2012,66(1):72-8.
  • 7Kriegel H P, KrUger P, Schubert E, et al. Interpreting and Unifying Outlier Scores[J]. SDM, 2011:13-24.
  • 8Qi Z J, Davidson I. A Principled and Flexible Framework for Finding Alternative Clusterings[C]. Acre Sigkdd International Conference on Knowledge Discovery & Data Mining. 2009:717-726.
  • 9Gondek D, Hofmann T. Non-Redundant Clustering with Conditional Ensembles[C]. Proceedings of the Eleventh ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining, Chicago, Illinois, USA, August 21-24, 2005. 2005:70-77.
  • 10Luxburg U V. A Tutorial on Spectral Clustering[J]. Statistics & Computing, 2007,17 (17):395-416.

引证文献6

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部