Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at t...Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at this problem,a parallelization approach was proposed with six memory optimization schemes for CG,four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20,the parallelization approach can reach up to 21 and 133 times speedups with size A and B,respectively,compared with single power processor element. Finally,the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV,simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores.展开更多
可重构密码流体系结构是一种面向密码运算的新型体系结构,但存在着超长指令字(VLIW)代码稀疏和Kernel体积过大的问题。该文以可重构密码流处理架构S-RCCPA为研究平台,通过大量密码算法在S-RCCPA架构上的适配分析,提出了VLIW可重构技术,...可重构密码流体系结构是一种面向密码运算的新型体系结构,但存在着超长指令字(VLIW)代码稀疏和Kernel体积过大的问题。该文以可重构密码流处理架构S-RCCPA为研究平台,通过大量密码算法在S-RCCPA架构上的适配分析,提出了VLIW可重构技术,并设计了Kernel级指令集、VLIW可重构算法及指令可重构单元。实验证明,该技术能够有效提高VLIW的指令密度,同时降低了VLIW的指令宽度,使得整个Kernel体积减小了约33.3%,并将微码存储器的容量由96 k B降为64 k B,有效降低芯片整体面积和系统功耗。展开更多
基金Project(2008AA01A201) supported the National High-tech Research and Development Program of ChinaProjects(60833004, 60633050) supported by the National Natural Science Foundation of China
文摘Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at this problem,a parallelization approach was proposed with six memory optimization schemes for CG,four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20,the parallelization approach can reach up to 21 and 133 times speedups with size A and B,respectively,compared with single power processor element. Finally,the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV,simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores.
文摘可重构密码流体系结构是一种面向密码运算的新型体系结构,但存在着超长指令字(VLIW)代码稀疏和Kernel体积过大的问题。该文以可重构密码流处理架构S-RCCPA为研究平台,通过大量密码算法在S-RCCPA架构上的适配分析,提出了VLIW可重构技术,并设计了Kernel级指令集、VLIW可重构算法及指令可重构单元。实验证明,该技术能够有效提高VLIW的指令密度,同时降低了VLIW的指令宽度,使得整个Kernel体积减小了约33.3%,并将微码存储器的容量由96 k B降为64 k B,有效降低芯片整体面积和系统功耗。