摘要
采用主成分分析、偏最小二乘回归和BP神经网络三种方法对嗜热和常温蛋白进行模式识别。结果表明,三种方法对训练集拟合的平均正确率分别为92%、95%和98%,对测试集进行预测的平均正确率分别为60%、72.5%和72.5%,对嗜热蛋白预测正确率最高为75%,常温蛋白最高为85%。构建了数学模型并对其生物学意义进行了解释,建立了一种基于序列的识别嗜热和常温蛋白的新方法。
Pattern recognition of thermophilic and mesophilic proteins were studied through principle component analysis, partial least-square regression and BP neural network. The results showed that the fitting accuracy of the three methods was 92% , 95% and 98% , respectively. And the forecasting accuracy was 60% , 72.5% and 72.5%, respectively. The best forecasting accuracy for thermophilic proteins was 75%, and for mesophilic proteins was 85%. A mathematical model was established and the biological meaning of it was expatiated on, a new method to discriminate the thermophilic and mesophilic proteins based on their sequences was established here.
出处
《生物工程学报》
CAS
CSCD
北大核心
2005年第6期960-964,共5页
Chinese Journal of Biotechnology
基金
国家自然科学基金资助项目(No.20276026)
国务院侨办科研基金资助项目(No.05QZR06)。~~
关键词
模式识别
主成分分析
偏最小二乘回归
BP神经网络
热稳定性
pattern recognition, principle component analysis, partial least-square regression, BP neural network,thermostability
作者简介
Corresponding author. Tel : 86-595-22691560 ; E-mail:fangbs@hqu.edu.cn