摘要
目的生物物证蛋白质组蕴含着丰富的遗传信息,即蛋白质序列的单氨基酸多态性(SAP)。然而,SAP分析工具的缺乏,严重制约了SAP在公安实战中的应用。本研究是为了满足法医样本蛋白质组SAP数据分析的应用需求。方法分为3个模块设计SAP分型自动化分析软件。模块B内置了东亚人群常见非同义单核苷酸多态性(nsSNP)信息,与输入外显子组新增的nsSNP一起构建SAP蛋白质序列数据库。模块A利用模块B构建的SAP蛋白质序列数据库,调用预装的pFind或Maxquant搜索引擎分析质谱数据。模块C输出参考型与突变型SAP分型结果,反向推导对应的nsSNP分型(称为imputed nsSNP),并与输入的外显子组nsSNP分型结果比较,生成比对报告。使用2名中国个体的外显子组nsSNP数据、每人2根毛干蛋白质组数据,分别使用pFind与Maxquant搜索引擎对软件进行测试。使用文献中1个欧洲人、1个非洲人,每人3根毛干蛋白质组数据,勾选pFind搜索引擎测试软件,并与文献算法结果进行比较。结果该软件以蛋白质组质谱数据与外显子组测序nsSNP结果为输入文件,输出SAP结果报告。测试结果显示,使用两种搜索引擎均可得到SAP结果,并且发现Maxquant得到的SAP数量略少于pFind的结果。使用文献数据测试结果显示,在文献方法完全匹配(即imputednsSNP与外显子组nsSNP分型完全一致)的SAP位点中,SAPTyper得到了部分SAP结果,且分型一致。结论针对东亚人群开发了一种自动化SAP分析算法,并形成软件SAPTyper。该软件为法医蛋白质组SAP进行个体识别与表型推断等方面研究与应用提供了一个便捷、高效的分析工具,具有良好的应用前景。
Objective The proteome of biological evidence contains rich genetic information,namely single amino acid polymorphisms(SAPs)in protein sequences.However,due to the lack of efficient and convenient analysis tools,the application of SAP in public security still faces many challenges.This paper aims to meet the application requirements of SAP analysis for forensic biological evidence’s proteome data.Methods The software is divided into three modules.First,based on a built-in database of common non-synonymous single nucleotide polymorphisms(nsSNPs)and SAPs in East Asian populations,the software integrates and annotates newly identified exonic nsSNPs as SAPs,thereby constructing a customized SAP protein sequence database.It then utilizes a pre-installed search engine—either pFind or MaxQuant—to perform analysis and output SAP typing results,identifying both reference and variant types,along with their corresponding imputed nsSNPs.Finally,SAPTyper compares the proteome-based typing results with the individual’s exome-derived nsSNP profile and outputs the comparison report.Results SAPTyper accepts proteomic DDA mass spectrometry raw data(DDA acquisition mode)and exome sequencing results of nsSNPs as input and outputs the report of SAPs result.The pFind and Maxquant search engines were used to test the proteome data of 2 hair shafts of 2 individuals,and both obtained SAP results.It was found that the results of the Maxquant search engine were slightly less than those of pFind.This result shows that SAPTyper can achieve SAP fingding function.Moreover,the pFind search engine was used to test the proteome data of 3 hair shafts from 1 European person and 1 African person in the literature.Among the sites fully matched by the literature method,sites detected by SAPTyper are also included;for semi-matching sites,that is,nsSNPs are heterozygous,both literature method and SAPTyper method had the risk of missing detection for one type of the allele.Comparing the analysis results of SAPTyper with the SAP test results reported in the literature,it was found that some imputed nsSNP sites identified by the literature method but not detected by SAPTyper had a MAF of less than 0.1%in East Asian populations,and therefore they were not included in the common nsSNP database of East Asian populations constructed by this software.Since the database construction of this software is based on the genetic variation information of East Asian populations,it is currently unable to effectively identify representative unique common variation sites in European or African populations,but it can still identify SAP sites shared by these populations and East Asian populations.Conclusion An automated SAP analysis algorithm was developed for East Asian populations,and the software named SAPTyper was developed.This software provides a convenient and efficient analysis tool for the research and application of forensic proteomic SAP and has important application prospects in individual identification and phenotypic inference based on SAP.
作者
胡峰
王梦娇
吴佳蕾
丁冬升
杨志远
季安全
丰蕾
叶健
HU Feng;WANG Meng-Jiao;WU Jia-Lei;DING Dong-Sheng;YANG Zhi-Yuan;JI An-Quan;FENG Lei;YE Jian(Graduate School,People’s Public Security University of China,Beijing 100038,China;National Engineering Laboratory for Forensic Science,Institute of Forensic Science,Ministry of Public Security,Beijing 100038,China;School of Basic Medical Sciences,Henan University,Kaifeng 475000,China;Public Security Bureau of Jiangyin,Jiangyin 214431,China)
出处
《生物化学与生物物理进展》
2025年第9期2406-2416,共11页
Progress In Biochemistry and Biophysics
基金
国家重点研发计划课题(2022YFC3341003)
现场物证溯源技术国家工程实验室开放课题(2021NELKFKT04)资助项目。
作者简介
并列第一作者:胡峰;并列第一作者:王梦娇;通讯联系人:叶健。Tel:010-83752707,E-mail:yejian77@126.com。