期刊文献+

分块内存的数据分布优化 被引量:1

Data Distribution Optimization on Block Memory
在线阅读 下载PDF
导出
摘要 为了提高访存效率,提供可以与计算流水线并行执行的多个独立的访存流水线,魂芯DSP片上存储器设计时采用分块内存结构,并在核内提供多个独立的地址生成单元用于访存操作.针对分块内存的结构特点,编译器对程序中的存储访问构建关于变量的冲突图,对分块内存进行存储块分配,优化数据在分块内存的分布.以数据在分块内存的优化分布为基础,指导程序中访存操作在地址生成单元的优化分配,使得编译器生成的代码可以最大程度地挖掘程序中数据访问的并行性.实验表明,基于分块内存的数据分配分布优化为其它优化如地址寄存器的分簇、访存向量化、软件流水等经典优化提供了良好基础,保证了编译器生成的代码可以充分发挥魂芯DSP提供的指令级并行能力. To improve the efficiency of memory access and provide multiple memory access pipeline which can be executed independ- ently in parallel with computing pipeline,BWDSP adopts block memory architecture and multiple address generation units. Therefore BWDSP compiler constructs conflict graph on memory access for program variables, and does block allocation to optimize data distri- bution on blocked memories. Based on this ,optimal allocation is finished for program memory access on address generation units, so that codes generated by compiler can maximize the data parallelism implied in programs. Experiments show that data distribution opti- mization provides a good prerequisite for other classic optimizations such as clustering on address register, memory access vectorization and software pipelining,to ensure radically that the codes can give full play to instruction-level parallelism equipped by BWDSP.
出处 《小型微型计算机系统》 CSCD 北大核心 2015年第4期815-819,共5页 Journal of Chinese Computer Systems
基金 国家"核心电子器件 高端通用芯片及基础软件产品"重大专项(2012ZX01034001-001)资助
关键词 分块内存 地址生成单元 冲突图 数据分布 block memory address generation unit conflict graph data distribution
作者简介 王向前,男,1985年生,博士研究生,研究方向为编译器设计与优化。E-mail:forward@mail.ustc.edu.cn 洪一.男,1963年生,教授,博士生导师,主要研究方向为信号处理器体系结构设计。 郑启龙,男,1969年生,副教授,研究方向为并行计算与并行编译.
  • 相关文献

参考文献4

二级参考文献36

  • 1DESOLI G. Instruction assignment for clustered VLIW DSP compilers: A new approach[ EB/OL]. [ 2009 - 06 - 20]. http://www. hpl. hp. com/techreports/98/HPL-98-13, pdf.
  • 2LAPINSKII V, JACOME M F, VECIANA G A. Cluster assignment for high performance embedded VLIW processors[ J]. ACM Transactions on Design Automation of Electronic Systems, 2002, 7(3) : 430 - 454.
  • 3HWU W W. The IMPACT Research Group[ EB/OL]. [ 2009 - 03 - 15]. http://impact, crhc. illinois, edu/.
  • 4RAU B R. Iterative modulo scheduling: An algorithm for software pipelining loops[ C]//Proceedings of the 27th International Symposium on Microarchitecture. New York: ACM, 1994:63 - 74.
  • 5CHOW F. Register allocation by priority-based coloring[ J]. ACM SIGPLAN Notices, 1984, 19(6) : 222 -232.
  • 6PHILIP B. Gibbons Efficient instruction scheduling for a pipelined architecture[ J]. ACM SIGPLAN Notices, 1986, 21 (7) : 11 - 16.
  • 7The Institute for Integrated Signal Processing Systems . DSPstone [ EB/OL]. [ 2009 -03 -20]. http://www, ert. rwth-aaehen, de/ Projekte/Tools/DSPSTONE/dspstone htmt.
  • 8Josep L, Eduard A, Mateo V. Quantitative evaluation of register pressure on software pipelined loops. Int'l Journal of Parallel Programming, 1998,26(2):121-142.
  • 9Smelyanskiy M, Tyson GS, Davidson ES. Register Queues: A new hardware/software approach to efficient software pipelining. In:Hurson AR, ed. Proc. of the 2000 Int'l Conf. on Parallel Architecture and Compilation Techniques. IEEE Press, 2000.
  • 10Akturan C, Jacome MF. RS-FDRA: A register sensitive software pipelining algorithm for embedded VLIW processors. In: Madson J, Henkel J, Hu XBS, eds. Proc. of the 9th Int'l Symp. on Hardware/Software Codesigh. New York: ACM Press, 2001.

共引文献21

同被引文献11

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部