基于IB方法的无冗余多视角聚类被引量：6

Non-Redundant Multi-View Clustering Based on Information Bottleneck

在线阅读下载PDF

导出

摘要针对数据中多视角模式挖掘的问题,提出一个基于IB方法的无冗余多视角聚类算法:NrMIB.该算法一方面采用IB思想来最大化地保存聚类结果中的信息量,以确保高质量的聚类结果;另一方面通过最小化聚类结果与已知数据划分模式间的互信息来确保新的聚类结果相对于已知划分模式是无冗余的.NrMIB算法既适宜于分析共现数据,又适宜于分析欧氏空间非共现数据,可挖掘出数据中线性及非线性可分模式,无需额外参数来估算欧氏空间的信息量.在人工构造数据模式识别、人脸识别和文档聚类上的实验结果表明,NrMIB算法可有效地挖掘出数据中所蕴含的多个合理划分模式,性能优于传统单视角聚类算法及3个现有的无冗余多视角聚类算法. Typical clustering algorithms output a single partition of the data. However, in real world applications, data can often be interpreted in many different ways and has different reasonable partitions from multiple views. Instead of committing to one clustering solution, here we introduce a novel algorithm, NrMIB （non-redundant multi-view information bottleneck）, which can provide several non-redundant clustering solutions from multiple views to the user. Our approach employs the information bottleneck fIB） method, which aims to maximize the relevant information preserved by clustering results, to ensure the qualities of the clustering solutions, whilst the mutual information between the clustering labels and the known data partitions is minimized to ensure that the new clustering solutions are non-redundant. By adopting the mutual information and MeanNN differential entropy to estimate the preserved information, the NrMIB can be used to analyze both co-occurrence data and Euclidean space data. Besides, our algorithm is also suitable to analyze high dimension data, and can discover both linear and non-linear cluster shapes. We perform experiments on synthetic data pattern recognition, face recognition, and document clustering to assess our method against a large range of clustering algorithms in the literature. The experimental results show that the proposed NrMIB algorithm can discover the multiple reasonable partitions resided in the data, and the performance of NrMIB is superior to three non-redundant multi-view clustering algorithms examined here.

作者娄铮铮叶阳东刘瑞娜

机构地区郑州大学信息工程学院

出处《计算机研究与发展》 EI CSCD 北大核心 2013年第9期1865-1875,共11页 Journal of Computer Research and Development

基金国家自然科学基金项目(61170223 61202207) 国家自然科学基金联合基金项目(U1204610)

关键词聚类无冗余多视角 IB方法互信息平均微分熵 clustering non-redundant multi-view information bottleneck （IB） method mutualinformation l meanNN differential entropy

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献31

1JainAK. MurtyMN. Flynn PJ. Data clustering: A review[J]. ACM Computing Surveys (CSUR). 1999. 31(3): 264- 323.
2Jain A K. Data clustering: 50 years beyond k-means[J]. Pattern Recognition Letters. 2010. 31(8): 651-666.
3Caruana R. Elhaway M. Nguyen N. et a1. Meta clustering[CJ IIProc of IEEE 6th Int Conf on Data Mining. Piscataway. NJ: IEEE. 2006: 107-118.
4Cui Y. Fern X Z. DyJ G. Non-redundant multi-view clustering via orthogonalization[CJ IIProc of IEEE 7th Int Conf on Data Mining. Piscataway. NJ: IEEE. 2007: 133- 142.
5Jain P. Meka R. Dhillon I S. Simultaneous unsupervised learning of disparate clusterings[J]. Statistical Analysis and Data Mining. 2008. 1(3): 195-210.
6Niu D. DyJ G.Jordan M I. Multiple non-redundant spectral clustering views[CJ IIProe of the 27th Int Conl on Machine Learning. New York: ACM. 2010: 831-838.
7Gondek D. Hofmann T. Conditional information bottleneck clustering[CJ IIProc of IEEE 3rd Int Conf on Data Mining, Workshop on Clustering Large Data Sets. Piscataway. NJ: IEEE. 2003: 36-42.
8Gondek D. Hofmann T. Non-redundant clustering with conditional ensembles[CJ IIProc of the 17th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM. 2005: 70-77.
9Bae E. BaileyJ. Coala , A novel approach for the extraction of an alternate clustering of high quality and high dissimilarity[CJ //Proc of IEEE 6th Int Conf on Data Mining. Piscataway, NJ: IEEE, 2006: 53-62.
10Gondek D. Hofmann T. Non-redundant data clustering[J]. Knowledge and Information Systems, 2007. 12(1): 1-24.

二级参考文献79

1叶阳东,刘东,贾利民,LI Gang.一种自动确定参数的sIB算法[J].计算机学报,2007,30(6):969-978. 被引量：5
2N Tishby, F Pereira, W Bialek. The information bottleneck method[ A] .Proceedings of 37th Allerton Conference on Communication, Control and Computing[ C]. 1999. 368- 377.
3N Slonim, N Friedman, N Tishby. Unsupervised document classification using sequential information maximization[ A ]. Proceedings of the 25th Ann. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval [ C ]. 2002. 129 - 136.
4N Slonim, N Tishby. Document clustering using word clusters via the information bottleneck method[ A]. Proceedings of 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [ C ]. Athens, Greece, 2000.208 - 215.
5J Goldberger, S Gordon, H Greenspan. Unsupervised image-set clustering using an information theoretic framework[ J]. IEEE Transactions on Image Processing, 2006,5 (2) : 449 - 458.
6M Gorodetsky. Methods for discovering semantic relations between words based on co-occurrence patterns in corpora[ D ]. School of Computer Science and Engineering, Hebrew university, Jerusalem, 2002.
7Winston H Hsu, Lyndon S Kennedy, Shih-Fu Chang. Video search remnking via information bottleneck principle[ A]. Proceedings of ACM International Conference on Multimedia[ C]. Santa Barbara, CA, USA, 2006.35 - 44.
8N Slonim. The information bottleneck: Theory and Application [ D ]. The Hebrew University of Jerusalem, Jerusalem, Israel,2002.
9N Slonim, N Tishby. Agglomerative information bottleneck [ A]. Proceedings of Advances in Neural Information Processing Systems (NIPS-2000) [ C ]. 1999, vol. 12.617 - 623.
10J Peltonen, J Sinkkonen, S Kaski. Sequential information bottleneck for finite data[ A]. Proceedings of 21st International Conference on Machine Learning[ C]. Madison, USA, 2004. 647 - 654.

共引文献32

1王兆庆,任永利,叶阳东.一种基于共现特征的顺序IB算法[J].广西师范大学学报（自然科学版）,2009,27(3):126-129.
2叶阳东,何锡点,贾利民.面向范畴类型数据的sIB算法[J].电子学报,2009,37(10):2165-2172. 被引量：5
3陈琼,李辉辉,肖南峰.基于节点动态属性相似性的社会网络社区推荐算法[J].计算机应用,2010,30(5):1268-1272. 被引量：5
4刘旭,易东云.基于保守合并策略的复杂网络社区结构发现[J].复杂系统与复杂性科学,2011,8(4):17-26.
5邓小龙,王柏,吴斌,杨胜琦.基于信息熵的复杂网络社团划分建模和验证[J].计算机研究与发展,2012,49(4):725-734. 被引量：16
6韩毅,方滨兴,贾焰,周斌,韩伟红.基于密度估计的社会网络特征簇挖掘方法[J].通信学报,2012,33(5):38-48. 被引量：9
7林旺群,卢风顺,丁兆云,吴泉源,周斌,贾焰.基于带权图的层次化社区并行计算方法[J].软件学报,2012,23(6):1517-1530. 被引量：21
8姬波,叶阳东.非共现数据两阶段加权IB算法[J].小型微型计算机系统,2012,33(10):2278-2282.
9林旺群,邓镭,丁兆云,吴泉源,贾焰,周斌.一种新型的层次化动态社区并行计算方法[J].计算机学报,2012,35(8):1712-1725. 被引量：9
10邱江涛,唐常杰,李庆.面向领域的学术文献检索框架[J].软件学报,2013,24(4):798-809. 被引量：3

同被引文献24

1王鹏,杨士强,刘志强.信息论联合聚类算法及其在视频镜头聚类中的应用[J].计算机学报,2005,28(10):1692-1699. 被引量：6
2吕庆文,陈武凡.基于互信息量的图像分割[J].计算机学报,2006,29(2):296-301. 被引量：40
3丁世飞,史忠植,靳奉祥,夏士雄.基于广义信息距离的直接聚类算法[J].计算机研究与发展,2007,44(4):674-679. 被引量：6
4叶阳东,刘东,贾利民,LI Gang.一种自动确定参数的sIB算法[J].计算机学报,2007,30(6):969-978. 被引量：5
5Chandola V, Banerjee A, Kumar V. Anomaly Detection: A Survey[J]. Acre Computing Surveys ,2009,41 (3):75-79.
6Zimek A, Schubert E, Kriegel H P. A Survey on Unsupervised Outlier Detection in High-Dimensional Numerical Data[J]. Statistical Analysis & Data Mining the Asa Data Science Joumal,2012,66(1):72-8.
7Kriegel H P, KrUger P, Schubert E, et al. Interpreting and Unifying Outlier Scores[J]. SDM, 2011:13-24.
8Qi Z J, Davidson I. A Principled and Flexible Framework for Finding Alternative Clusterings[C]. Acre Sigkdd International Conference on Knowledge Discovery & Data Mining. 2009:717-726.
9Gondek D, Hofmann T. Non-Redundant Clustering with Conditional Ensembles[C]. Proceedings of the Eleventh ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining, Chicago, Illinois, USA, August 21-24, 2005. 2005:70-77.
10Luxburg U V. A Tutorial on Spectral Clustering[J]. Statistics & Computing, 2007,17 (17):395-416.

引证文献6

1姚鹏,古平.一种基于多视角聚类的离群检测算法[J].现代计算机（中旬刊）,2016(5):43-47.
2娄铮铮,叶阳东.基于最大化交叉互信息的对称IB算法[J].计算机学报,2016,39(8):1515-1527. 被引量：3
3徐秋晔,李玉,林文杰,赵泉华.基于信息聚类的遥感图像分割[J].中国矿业大学学报,2017,46(1):209-214. 被引量：14
4潘笑颜,娄铮铮,姬波,叶阳东.多视角生成模型的可解释性聚类[J].计算机研究与发展,2017,54(8):1713-1723.
5胡世哲,娄铮铮,王若彬,闫小强,叶阳东.一种双重加权的多视角聚类方法[J].计算机学报,2020,43(9):1708-1720. 被引量：12
6杜国王,周丽华,王丽珍,杜经纬.基于两级权重的多视角聚类[J].计算机研究与发展,2022,59(4):907-921. 被引量：2

二级引证文献31

1王燕,亓祥惠,段亚西.基于核函数与马氏距离的FCM图像分割算法[J].计算机应用研究,2020,37(2):611-614. 被引量：15
2陈峰,陈济刚.给新千年的新一代机型[J].国际纺织导报,2000,28(1):50-50.
3王小鹏,张永芳,王伟,文昊天.基于自适应滤波的快速广义模糊C均值聚类图像分割[J].模式识别与人工智能,2018,31(11):1040-1046. 被引量：8
4陈俊颖,周顺风,闵华清.用于垃圾邮件识别的“词频-筛”混合特征选择方法[J].华南理工大学学报（自然科学版）,2017,45(3):82-88.
5马文,史潇,岳建平.基于三维融合策略的高光谱遥感影像全变分去噪[J].中国矿业大学学报,2018,47(3):678-684.
6李玉,王亚琼,赵雪梅,赵泉华,姜治.基于随机投影深度函数的非典型道路中心线提取[J].中国矿业大学学报,2018,47(5):1131-1140. 被引量：4
7顾振飞,张登银.基于变分Retinex模型的雾天图像增强方法[J].中国矿业大学学报,2018,47(6):1386-1394. 被引量：6
8吕正萍,纪汉霖.数种基于SPSS统计工具的聚类算法效率对比[J].软件导刊,2018,17(11):81-85. 被引量：10
9石雪,李玉,赵泉华.层次化高斯混合模型和M-H的遥感影像分割算法[J].中国矿业大学学报,2019,48(3):668-675. 被引量：3
10万辉,李华光,朱晓华,徐明强.海洋空间情报大数据应用发展[J].中国航海,2019,42(3):76-81. 被引量：2

1娄铮铮,杨晨,叶阳东.基于数据选择模型的IB算法[J].电子学报,2014,42(9):1839-1846. 被引量：2
2江鹏,叶阳东,娄铮铮.一种面向非平衡数据的多簇IB算法[J].计算机科学,2016,43(7):245-250. 被引量：2
3孙晓雨.优化网络资源配置对网络安全性的积极影响研究[J].黑龙江科技信息,2016(9):158-158.
4娄铮铮,叶阳东.基于最大化交叉互信息的对称IB算法[J].计算机学报,2016,39(8):1515-1527. 被引量：3
5朱真峰,叶阳东,Gang Li.基于变异的迭代sIB算法[J].计算机研究与发展,2007,44(11):1832-1838. 被引量：5
6陈伯伦,陈崚,邹盛荣,徐秀莲.基于矩阵分解的二分网络社区挖掘算法[J].计算机科学,2014,41(2):55-58. 被引量：11
7刘彬卿.优化网络资源配置提高网络安全性[J].信息安全与技术,2013,4(12):13-14. 被引量：2
8邓建新.用例分析技术应用研究[J].企业技术开发,2014,33(5):42-44.
9贺成龙,马宏,董哲,张风雨.基于核心链路的重叠社区发现算法[J].小型微型计算机系统,2015,36(6):1225-1229.
10酷6网推出首款Android系统手机视频客户端[J].互联网天地,2009(11):10-10.

计算机研究与发展

2013年第9期

浏览历史

内容加载中请稍等...

基于IB方法的无冗余多视角聚类被引量：6

参考文献31

二级参考文献79

共引文献32

同被引文献24

引证文献6

二级引证文献31

相关作者

相关机构

相关主题

浏览历史

基于IB方法的无冗余多视角聚类 被引量：6

参考文献31

二级参考文献79

共引文献32

同被引文献24

引证文献6

二级引证文献31

相关作者

相关机构

相关主题

浏览历史

基于IB方法的无冗余多视角聚类被引量：6