基于多网络数据协同矩阵分解预测蛋白质功能被引量：1

Protein Function Prediction Based on Multiple Networks Collaborative Matrix Factorization

在线阅读下载PDF

导出

摘要准确预测蛋白质功能是生物信息学的核心任务之一,也是人工智能在生物数据分析中的重要应用点之一.高通量技术的广泛应用产生了大量的生物分子功能关联网络,整合这些网络可更为全面地分析理解蛋白质功能机理,提升蛋白质功能预测精度.已有多种基于数据整合的蛋白质功能预测方法,但它们通常难以应用到较大功能标签空间,未利用标签间关联性和差异性整合多个网络.提出一种基于多网络数据协同矩阵分解的蛋白质功能预测方法(ProCMF).该方法首先利用非负矩阵分解将蛋白质-功能标签关联矩阵分解为2个低秩矩阵,挖掘蛋白质与标签之间的潜在关联.其次,为利用标签间关联关系和多种蛋白质特征数据,ProCMF分别基于上述2个低秩矩阵定义平滑正则性,约束指导低秩矩阵的协同分解.为了差异性地集成多个网络,ProCMF对不同的网络设置不同的权重.最后ProCMF将上述目标统一到一个目标方程中,并用一种交替迭代的方法分别优化求解低秩矩阵和网络权重.在酵母菌、人类和老鼠3个模式物种的多网络数据集上的实验结果表明:ProCMF获得了较其他相关算法更好的预测性能,ProCMF能有效地处理大量的功能标签和区分性地整合多个网络. Accurately and automatically predicting biological functions of proteins is one of the fundamental tasks in bioinformatics,and it is also one of the key applications of artificial intelligence in biological data analysis.The wide application of high throughput technologies produces various functional association networks of molecules.Integrating these networks contributes to more comprehensive view for understanding the functional mechanism of proteins and to improve the performance of protein function prediction.However,existing network integration based solutions cannot apply to a large number of functional labels,ignore the correlation between labels,or cannot differentially integrate multiple networks.This paper proposes a protein function prediction approach based on multiple networks collaborative matrix factorization(ProCMF).To explore the latent relationship between proteins and between labels,ProCMF firstly applies nonnegative matrix factorization to factorize the protein label association matrix into two low rank matrices.To employ the correlation between labels and to guide the collaborative factorization with proteomic data,it defines two smoothness terms on these two low rank matrices.To differentially integrate these networks,ProCMF sets different weights to them.In the end,ProCMF combines these goals into a unified objective function and introduces an alternative optimization technique to jointly optimize the low rank matrices and weights.Experimental results on three model species(yeast,human and mouse)with multiple functional networks show that ProCMF outperforms other related competitive methods.ProCMF can effectively and efficiently handle massive labels and differentially integrate multiple networks.

作者余国先王可尧傅广垣王峻曾安 Yu Guoxian;Wang Keyao;Fu Guangyuan;Wang Jun;Zeng An(College of Computer and Information Science, Southwest University, Chongqing 400715;School of Computers, Guangdong University of Technology, Guangzhou 510006)

机构地区西南大学计算机与信息科学学院广东工业大学计算机学院

出处《计算机研究与发展》 EI CSCD 北大核心 2017年第12期2931-2944,共14页 Journal of Computer Research and Development

基金国家自然科学基金项目(61402378 61772143) 重庆市自然科学基金项目(cstc2016jcyjA0351)~~

关键词蛋白质功能预测功能关联网络网络集成非负矩阵分解协同分解 protein function prediction functional association network network integration nonnegative matrix factorization collaborative factorization

分类号 TP391 [自动化与计算机技术—计算机应用技术]

作者简介 gxyu@swu.edu.cn.Yu Guoxian, born in 1985. Associate professor. Member of CCF. His main research interests include machine learning, data mining and bioinformatics.;Wang Keyao, born in 1994. Master candidate. Student member of CCF. His main research interests include machine learning and bioinformatics (keyaowang@email.swu.edu.cn).;Fu Guangyuan, born in 1993. Master. Student member of CCF. His main research interests include machine learning and bioinformatics (fugy@email.swu.edu.cn).;通信作者:王峻(kingjun@swu.edu.cn).Wang Jun, born in 1983. Associate professor. Member of CCF. Her main research interests include data mining and bioinformatics.;Zeng An, born in 1978. Professor. Member of CCF. Her main research interests include artificial intelligence, machinelearning and big data (zengan2010@126.com).

引文网络
相关文献

参考文献5

1李敏,孟祥茂.动态蛋白质网络的构建、分析及应用研究进展[J].计算机研究与发展,2017,54(6):1281-1299. 被引量：20
2高玉凯,王新华,郭磊,陈竹敏.一种基于协同矩阵分解的用户冷启动推荐算法[J].计算机研究与发展,2017,54(8):1813-1823. 被引量：26
3申国伟,杨武,王巍,于淼,董国忠.基于非负矩阵分解的大规模异构数据联合聚类[J].计算机研究与发展,2016,53(2):459-466. 被引量：6
4余国先,傅广垣,王峻,郭茂祖.基于降维的蛋白质不相关功能预测[J].中国科学：信息科学,2017,47(10):1349-1368. 被引量：4
5傅广垣,余国先,王峻,郭茂祖.基于正负样例的蛋白质功能预测[J].计算机研究与发展,2016,53(8):1753-1765. 被引量：6

二级参考文献42

1高磊,李霞,郭政,朱明珠,李彦辉,饶绍奇.结合蛋白质互作与基因表达谱信息大范围预测蛋白质的精细功能[J].中国科学（C辑）,2006,36(5):441-450. 被引量：8
2Tanay A, Sharan R, Shamir R. Biclustering algorithms: A survey [J]. IEEE Trans on Computational Biology and Bioinformatics, 2004, 1(1) : 24-45.
3Kemal E, Mehmet D, Onur K, et al A comparative analysis of hiclustering algorithms for gene expression data [J]. Briefings in Bioinformatics, 2013, 14(3) : 279-292.
4Inderjit S D, Mallela S, Modha D S. Information-theoretic co clustering [C] //Proc of the 9th ACM SIGKDD. New York: ACM, 2003:89-98.
5Inderjit S D. Co-clustering documents and words using bipartite spectral graph partitioning [C] //Proc of the 7th ACM SIGKDD. New York: ACM; 2001:269-274.
6Li Tao, Ding Chris. Non Negative Matrix Factorizations for Clustering: A Survey [M]//Data Clustering: Algorithms and Applications. London: Chapman & Hall/CRC, 2013: 149- 176.
7Long Bo, Zhang Zhongfei, Yu P S. Co-clustering by block value decomposition [C] //Proc of the 11th ACM SIGKDD. New York: ACM, 2005 : 635-640.
8Tjhi W C, Chen Lihui, Minimum sum-squared residue for fuzzy co-clustering[J]. Intelligent Data Analysis, 2006, 10 (3) : 237-249.
9Li Zhao, Wu Xindong. Weighted nonnegative matrix trifactorization for co-clustering [C]//Proc of the 23rd IEEE Int Conf on Tools with Artificial Intelligence. Piscataway, NJ: IEEE, 2011:811-816.
10Shang Fanhua, Jiao Licheng, Wang Fei. Graph dual regularization non negative matrix factorization for co clustering [J]. Pattern Recognition, 2002, 45(6): 2237- 2250.

共引文献56

1苗艳艳.生物体内多胺对蛋白质影响的研究进展[J].智慧健康,2020(15):53-54.
2余国先,傅广垣,王峻,郭茂祖.基于降维的蛋白质不相关功能预测[J].中国科学：信息科学,2017,47(10):1349-1368. 被引量：4
3邢静,黄鑫炎,郭禹标.特发性肺纤维化相关基因的筛选和生物信息学分析[J].中山大学学报（医学科学版）,2017,38(6):926-930. 被引量：4
4翁小兰,王志坚.协同过滤推荐算法研究进展[J].计算机工程与应用,2018,54(1):25-31. 被引量：87
5吴红岩,冀俊忠.基于花授粉算法的蛋白质网络功能模块检测方法[J].山东大学学报（工学版）,2018,48(1):21-30. 被引量：1
6冯新扬,沈建京.一种基于Yarn云计算平台与NMF的大数据聚类算法[J].信息网络安全,2018(8):43-49. 被引量：4
7路畅,陈霞,王峻,余国先,余志文.基于稀疏语义的蛋白质噪声功能标注识别[J].中国科学：信息科学,2018,48(8):1035-1050.
8高亨德,王智强,李茹.基于信任关系和词相关关系的冷启动用户词特征重建[J].中文信息学报,2018,32(5):80-88. 被引量：3
9伍杰华,张小兰,沈静,周蓓.一种集成局部加权聚类系数的链接预测算法[J].计算机应用研究,2018,35(12):3588-3592. 被引量：5
10杜治娟,张祎,孟小峰,王秋月.EAE:一种酶知识图谱自适应嵌入表示方法[J].计算机研究与发展,2017,54(12):2907-2919. 被引量：2

同被引文献3

1WANG Peng,XU BaoWen,WU YuRong,ZHOU XiaoYu.Link prediction in social networks: the state-of-the-art[J].Science China(Information Sciences),2015,58(1):1-38. 被引量：56
2李川,冯冰清,李艳梅,胡绍林,杨宁,唐常杰.动态信息网络中基于角色的结构演化与预测[J].软件学报,2017,28(3):663-675. 被引量：7
3张林,程华,房一泉.基于卷积神经网络的链接表示及预测方法[J].浙江大学学报（工学版）,2018,52(3):552-559. 被引量：8

引证文献1

1伍杰华,熊云艳,张顶,陈嘉志.融合多元影响力节点识别指标MPR的链接预测[J].计算机工程,2020,46(4):301-308. 被引量：1

二级引证文献1

1周波.基于多层次注意力机制的交通节点分类[J].广东交通职业技术学院学报,2021,20(1):42-46.

1刘全新,韩威,李涛,纪智礼.腹腔镜联合胆道镜治疗胆总管结石复发预后的临床分析[J].现代医学,2017,45(9):1333-1337. 被引量：11
2余国先,傅广垣,王峻,郭茂祖.基于降维的蛋白质不相关功能预测[J].中国科学：信息科学,2017,47(10):1349-1368. 被引量：4
3胡海峰,郑茂,吴伟坚,王俊,吴建盛.基于多示例多标记迁移学习的蛋白质功能预测[J].中国科学：信息科学,2017,47(11):1538-1550. 被引量：3
4王琦暄.消费体验从“黑科技”入手[J].物流时代周刊,2017,0(12):64-66.
5毫不动摇推进党的建设新的伟大工程[J].奋斗,2017,0(23):29-30.
6IPC助手[J].网络安全和信息化,2017,0(11):164-164.
7郑德高,朱雯娟,陈阳,刘培锐.基于网络和节点对长三角城镇空间的再认识[J].城市规划学刊,2017(A02):20-26. 被引量：9
8汤惠俊.中高职会计专业课程体系衔接问题的分析与探讨[J].时代教育,2017,0(22):11-11. 被引量：1
9陈兆锋.基于知行统一目标达成的初中思想品德课诚信教育策略研究[J].考试周刊,2017,0(38):90-90.
10王丹.文本解读转化为有效的教学设计[J].长江丛刊,2017,0(28):102-102.

计算机研究与发展

2017年第12期

浏览历史

内容加载中请稍等...

基于多网络数据协同矩阵分解预测蛋白质功能被引量：1

参考文献5

二级参考文献42

共引文献56

同被引文献3

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于多网络数据协同矩阵分解预测蛋白质功能 被引量：1

参考文献5

二级参考文献42

共引文献56

同被引文献3

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于多网络数据协同矩阵分解预测蛋白质功能被引量：1