摘要
[Objective] The research aimed to construct the discriminant classification model of DNA sequence by combining with the biology knowledge and the mathematical method.[Method] According to the polarity nature of side chain radical in the amino acid,the classification information of amino acid which represented the sequence characteristic from the content and array situation of base was extracted from the different sequences that the amino acid content was different.The four-dimension vector was used to represent.Mahalanobis distance and Fisher discriminant methods were used to classify the given sequence.[Result] In the model,the back substitution rates of sample obtained by two kinds of classification methods were both 100%,and the consistent rate of classification was 90%.[Conclusion] In the model,the calculation method was simple,and the accuracy of classification result was higher.It was superior to the discriminant classification model which was only based on the base content.
[目的]结合生物学知识和数学方法构建DNA序列判别分类模型。[方法]根据氨基酸分子中侧链基的极性性质,从不同序列中氨基酸含量不同提炼出能从碱基含量和碱基排列情况两方面代表序列特征的氨基酸类别信息,用一个四维向量来表征,用马氏距离法和FISHER判别法对给定序列进行分类。[结果]该模型中,2种分类方法所得的样本回代率均达100%,分类一致率为90%。[结论]该模型算法简单,分类结果精度较高,优于仅基于碱基含量的判别分类模型。
基金
Supported by Science Research Project of Ningbo Dahongying University in2011(CF102601)~~
作者简介
作者简介王显金(1978-),男,江西兴国人,讲师,从事教学教育和数学应用研究。通讯作者。