摘要
本文首先分析LU分解中舍入误差的积累过程,建立精度损失与矩阵规模的关系模型来预测大规模LU分解的精度;然后,根据定点加法的简单、快速、无精度损失的特点,设计高精度乘累加器(HPMAcc),并基于此实现一个细粒度并行LU分解加速器。实验结果表明,和高精度软件库QD或MPFR相比,4PE结构的LU分解加速器能够取得100倍的加速比,同时取得90多位的计算精度。
In this paper we analyze the course of rounding error accumulation in the LU decomposition, and create a model, between the loss accuracy of the result and the scale of matrix, to predict the accuracy of large scale LU decompositions. Then, we design a high-precision multiplying-accumulating (HPMAcc) unit in terms of the features of the simple, fast and error-free fixed-point add, and a fine-grain parallel LU decomposition accelerator based on this multiplying-accumulating unit. Compared to the implementation of a high-precision software library such as QD or MPFR, the speed-up factors up to more than 100 are obtained. Meanwhile, more than 90 bits of accuracy can he achieved.
出处
《计算机工程与科学》
CSCD
北大核心
2009年第11期33-36,共4页
Computer Engineering & Science
基金
国家自然科学基金资助项目(60633050)
关键词
舍入误差
LU分解
高精度乘累加
rounding error
LU decomposition
high-precision multiply and accumulate
作者简介
雷元武(1982-),男,湖南桂阳人,博士生,研究方向为高性能体系结构;通讯地址:410073湖南省长沙市国防科技大学计算机学院博士生队;Tel:13607315362;E-mail:yuanwulei@nudt.edu.cn
窦勇,教授,博士生导师,研究方向为高性能体系结构和可重构体系结构。