摘要
贝叶斯与Metropolis-Hastings算法的高效实现让MrBayes成为使用广泛的分子序列系统发育分析工具。然而,分子序列与进化参数的增加导致候选分子树样本空间急剧扩大,使得系统发育树的重构工作面临巨大计算挑战。为降低MrBayes系统发育分析中分子树条件似然概率的计算时间,提高分析效率,近年来出现一批基于图形处理器(GPU)的并行加速方法。为提高并行方法的可扩展性,提出了一种优化的似然概率多线程并行计算方法。根据位点间可变进化速率模型中分子状态似然概率的计算需要对应不同转移概率矩阵,将前期使用多线程对不同位点似然概率的并行计算,进一步分解为多位点间不同转移概率矩阵下的条件似然概率的计算。该策略在不改变单个线程计算传输比的基础上,通过增加线程数量,优化了线程warp间的并行重叠度,提高了并行效率。此外,由于每个线程warp只计算同一种转移概率矩阵下的似然概率,避免了在使用共享内存时不同warp间的同步开销,进一步提升了内核计算效率。所提方法与前期方法在4组实际数据和30组模拟数据上的计算结果表明,在核心似然函数的计算加速上,本文取得的计算性能超过tgMC3(2.0版)和nMC3(2.1.1版)方法,最高达1.78和2.04倍。
The efficient implementation of Bayesian and Metropolis Hastings algorithms makes Mrbayes a widely used tool for molecular sequence phylogenetic analysis.However,the increase of molecular sequences and evolutionary parameters leads to the rapid expansion of the sample space of candidate molecular trees,which makes the reconstruction of phylogenetic trees face great computational challenges.In order to reduce the calculation time of conditional likelihood probability of molecular tree in mrbayes phylogenetic analysis and improve the analysis efficiency,a number of parallel acceleration methods based on graphics processor(GPU)have emerged in recent years.In order to improve the scalability of parallel methods,an optimized likelihood probability multithreaded parallel computing method is proposed in this paper.As the calculation of molecular state likelihood probability in the variable evolution rate model between sites needs to correspond to different transition probability matrices,this method further decomposes the parallel calculation of likelihood probability of different sites using multithreading into the calculation of conditional likelihood probability under different transition probability matrices between multiple sites.This strategy optimizes the parallel overlap between threads and improves the parallel efficiency by increasing the number of threads without changing the calculation transmission ratio of a single thread.In addition,because each thread warp only calculates the likelihood probability under the same transition probability matrix,it avoids the synchronization overhead between different warps when using shared memory,and further improves the computing efficiency of the kernel.Calculation results of 4groups of actual data and 30groups of simulated data show that the computational performance of this method is 1.78and 2.04times higher than that of tgMC3(version 2.0)and nMC3(version 2.1.1)in the calculation acceleration of core likelihood function.
作者
黄佳为
李晓鹏
凌诚
HUANG Jia-wei;LI Xiao-peng;LING Cheng(School of Information Science and Technology,Beijing University of Chemical Technology,Beijing 100000,China)
出处
《计算机科学》
CSCD
北大核心
2022年第S02期919-925,共7页
Computer Science
基金
国家自然科学基金(61602026)
作者简介
黄佳为,867454915@qq.com,born in 1996,postgraduate.His main research interests include high-performance GPU computing in bioinformatics and so on;通信作者:凌诚,lingcheng@mail.buct.edu.cn,born in 1987,Ph.D,associate professor.His main research interests include high-performance GPU computing on computational biology and bioinformatics.