3种提高软件流水有效性的算法:比较和结合被引量：2

Three Algorithms for Improving the Effectiveness of Software Pipelining: Comparison and Combination

在线阅读下载PDF

导出

摘要软件流水是开发循环程序指令级并行性的技术,它通过并行执行连续的多个循环体来加快循环的执行速度.在软件流水中,循环体的重叠增加了寄存器需求,导致寄存器压力增大,当目标处理机所提供的寄存器不足时,软件流水可能失败.在Itanium处理机上评估了NAS和SPEC2000基准程序中的软件流水循环的寄存器需求,发现静态寄存器不足是造成软件流水失败的主要原因,提出了3种增加软件流水个数、提高软件流水有效性的算法:限制循环展开因子的算法(registersensitiveunrolling,简称RSU)、堆栈寄存器分配算法(stackedregisterallocation,简称SRA)以及变量类型转换的算法(variabletypeconversion,简称VTC).RSU根据静态寄存器需求确定一个合理的展开因子,增加了软件流水的成功率;SRA和VTC分别使用空闲的堆栈寄存器和旋转寄存器来充当静态寄存器,提高了寄存器的利用率.在面向Itanium处理器的开放源码编译器ORC(openresearchcompiler)上实现了这3种算法,通过NAS程序的测试比较了这3种算法的有效性,同时对它们的结合应用进行了研究和实验. Software pipelining is a loop scheduling technique that extracts instruction level parallelism by overlapping the execution of several consecutive iterations. One of its drawbacks is the high register requirements, which may lead to software pipelining failure due to insufficient static general registers in Itanium. This paper evaluates the register requirements of software-pipelined loops and presents three new methods for software pipelining loops that require more static general registers than those available in Itanium processor. They reduce register pressure by either reducing instructions in the loop body or allocating stacked non-rotating registers or rotating register in register stack to serve as static registers. These methods are better than the existing techniques in that they further improve performance gain from software pipelining by increasing software-pipelined loops. These methods have been implemented in open research compiler （ORC） targeted for Itanium processor, and they perform well on loops of the programs in NAS Benchmarks. For some benchmarks, the performance is improved by more than 21%.

作者李文龙陈彧林海波汤志忠

机构地区清华大学计算机科学与技术系 Intel中国研究中心编译组

出处《软件学报》 EI CSCD 北大核心 2005年第10期1822-1832,共11页 Journal of Software

基金国家自然科学基金~~

关键词软件流水静态变量静态寄存 ITANIUM 循环展开寄存器分配 software pipelining static variant static register Itanium loop unrolling register allocation

分类号 TP302 [自动化与计算机技术—计算机系统结构]

作者简介李文龙（1977-），男，辽宁鞍山人，博士，研究员，主要研究领域为计算机体系结构，指令级并行算法.Corresponding author： Phn：＋86-10-62773730, E-mail： liwenlong00@mails.tsinghua.edu.cn, http：//www.tsinghua.edu.cn 陈或（1981-），男，博士生，主要研究领域为指令级并行算法. 林海波（1978-），男，博士，主要研究领域为计算机系统结构，指令级并行算法和多线程. 汤志忠（1946-），男，教授，博士生导师，主要研究领域为计算机系统结构，指令级并行算法，并行编译技术．

引文网络
相关文献

参考文献14

1Allen VH, Jones RB, Lee RM, Allan SJ. Software pipelining. ACM Computing Surveys, 1995,27(3):367-432.
2Huff RA. Lifetime-Sensitive modulo scheduling. In: Budd TA, ed. Proc. of the ACM SIGPLAN'93 Conf. on Programming Language Design and Implementation. New York: ACM Press, 1993. 258-267.
3Dehnert JC, Towle RA. Compiling for the Cydra 5. Journal of Supercomputing, 1993,7(1-2):181-228.
4Dulong C, Krishnaiyer R, Kulkarni D, Lavery D, Li W, Ng J, Sehr D. An overview of the Intel IA-64 compiler. Intel Technology Journal, 1999.
5Rau BR, Lee M, Tirumalai PP, Schlansker MS. Register allocation for software pipelined loops. In: Allen R, ed. Proc. of the ACM SIGPLAN'92 Conf. on Programming Language Design and Implementation. New York: ACM Press, 1992. 283-299.
6Dehnert JC, Hsu PY, Bratt JP. Overlapped loop support in the Cydra 5. In: Hennessy J, ed. Proc. of the 3rd Int'l Conf. on Architectural Support for Programming Languages and System. New York: ACM Press, 1989.26-38.
7Intel Corporation. Intel ItaniumTM Architecture Software Developer's Manual. Volume 1: Application Architecture. Intel Corp.,2001.
8Roy J, Sun C, Wu CY. Tutorial: Open research compiler for Itanium processor family (IPF). In: Proc. of the 34th Annual Int'l Symp. on Microarchitecture. New York: ACM Press, 2001.
9Intel Corporation. Intel ItaniumTM Architecture Software Developer's Manual. Volume 3: Instruction Set Reference. Intel Corp,2001.
10Mangione SW, Abraham SG, Davidson ES. Register requirements of pipelined processors. In: Kennedy K, Polychronopoulos CD,ed. Proc of Int'l Conf. on Supercomputing. New York: ACM Press, 1992.260-271.

同被引文献7

1李小波,薛王伟,孙志勇.一种求解复Hermite矩阵特征值的方法[J].数据采集与处理,2005,20(4):403-406. 被引量：11
2GEORGE A D, GARCIA J, KIM K, et al. Distributed parallel processing techniques for adaptive sonar beareforming[R]. High-performance computing and simulation (HCS) research laboratory. Department of Electrical and Computer Engineering, 2000.
3SCHMIDT R O. Multiple emitter location and signal parameter estimation[J].IEEE Trans, 1986,34(3) : 276-280.
4WILKINSON J H. The algebraic eigenvalue problem[M].Oxford : Oxford University Publishing Company, 1965 : 24-30.
5雷一鸣,洪一,徐云,姜海涛.一种基于寄存器压力的VLIWDSP分簇算法[J].计算机应用,2010,30(1):274-276. 被引量：9
6郑启龙,卢世贤,洪兴勇,陈元,夏霏.DSP分块内存和多AGU的编译指示优化[J].小型微型计算机系统,2012,33(3):582-586. 被引量：3
7魏帅,赵荣彩,姚远.面向SLP的多重循环向量化[J].软件学报,2012,23(7):1717-1728. 被引量：13

引证文献2

1郭元曦,桑恩方,王继胜.MUSIC算法在分布式并行处理机上的实现研究[J].电子技术应用,2007,33(1):112-114. 被引量：4
2王向前,洪一,郑启龙.分块内存的数据分布优化[J].小型微型计算机系统,2015,36(4):815-819. 被引量：1

二级引证文献5

1刘晶,栾晓明,陆娜,简容坤.基于多DSP并行结构实现MUSIC算法的设计[J].自动化技术与应用,2008,27(3):19-22. 被引量：2
2徐德琛,刘志文,徐友根.某测向系统中MUSIC算法的FPGA实现[J].北京理工大学学报,2010,30(9):1107-1111. 被引量：7
3王大磊,王斌,韩广.空间谱算法的快速实现研究[J].计算机工程与设计,2010,31(23):4983-4987.
4王学猛,王斌.二维Root-MUSIC算法的快速实现方法[J].声学技术,2011,30(6):542-546. 被引量：2
5张博,盛魁,陈继祥,董辉.一种改进的内存索引算法在中药追溯数据处理中的应用[J].通化师范学院学报,2016,37(6):70-73.

1林海波,李文龙,汤志忠.IA-64中软件流水失败的解决方法[J].清华大学学报（自然科学版）,2003,43(7):997-1000. 被引量：2
2张素珍,耿磊.JAVA语言静态变量和静态方法的分析及其应用研究[J].计算机系统应用,2006,15(5):84-87. 被引量：2
3林海波,李文龙,汤志忠.IA-64中软件流水的寄存器需求研究[J].计算机研究与发展,2004,41(1):22-27. 被引量：4
4王彬.基于C#的WinForm窗体传值方法探讨[J].电脑知识与技术（过刊）,2013,19(6X):3999-4000. 被引量：7
5闫国昌,何炎祥,李清安.降低寄存器软错误的静态寄存器重分配方法[J].计算机应用,2014,34(9):2730-2733.
6贺军,李喜梅.Java语言中变量和方法的分析及其应用[J].计算机系统应用,2011,20(7):228-232.
7火善栋,杨旭东.论C/C++内存管理中静态区、栈和堆的相互关系[J].重庆三峡学院学报,2013,29(3):40-42. 被引量：1
8明月.英特尔宣布将推仿真软件为Itanium处理器提速[J].通信与信息技术,2003,0(3):60-60.
9黄文政,周顺坡.Itanium中断机制分析[J].高性能计算技术,2003,0(1):28-32.
10罗斌.64位体系结构的新发展——Intel Itanium处理器[J].微电脑世界,2001(6):81-82. 被引量：1

软件学报

2005年第10期

浏览历史

内容加载中请稍等...

3种提高软件流水有效性的算法:比较和结合被引量：2

参考文献14

同被引文献7

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

3种提高软件流水有效性的算法:比较和结合 被引量：2

参考文献14

同被引文献7

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

3种提高软件流水有效性的算法:比较和结合被引量：2