期刊文献+

CPU-GPU系统中基于剖分的全局性能优化方法 被引量:10

Profiling Based Optimization Method for CPU-GPU Heterogeneous Parallel Processing System
在线阅读 下载PDF
导出
摘要 针对将应用移植到CPU-GPU异构并行系统上时优化策略各自分散、没有一个全局的指导思想的问题,提出了一种基于剖分的全局性能优化方法.该方法由优化策略库、剖分工具库和策略配置模块组成.优化策略库将应用移植到异构并行系统上的性能优化过程划分为访存级、内核加速级和数据划分级3级优化;针对3级优化剖分工具库提供了3级剖分机制,通过运行时的剖分技术获取剖分信息;策略配置模块根据所获取的信息指导用户在每级优化中选择合适的优化策略.实验证明,基于剖分的全局性能优化方法可以明确地指导将应用移植到CPU-GPU异构并行系统上的全局优化过程,利用该优化方法后,以矩阵相乘和傅里叶变换为例的应用性能提升明显,最终性能相对于访存级优化最高可提高30%左右. A profiling based optimization method for CPU-GPU heterogeneous parallel processing system is proposed to address the problem that the present optimization strategies get sectional thus failed to guide a global optimization.It is composed of the optimization strategy library,the profiling tool library,and the strategy deploy module,and the optimization strategy library divides the performance promotion process into a three-level optimization,including the memory-access level,the kernel-speedup level,and the data-partition level.The profiling tool library realizes three-level profiling mechanisms towards three-level optimizations to obtain application information,and the strategy deploy module guides users to choose an adaptive strategy with the information obtained by profiling tool library.Experimental results show that the proposed one is able to guide the optimization process of applications transplanted to heterogeneous parallel system.The performance for matrix multiplication and fast Fourier transform are improved obviously,and the final performance is heightened by 30% compared with the memory-level optimization.
出处 《西安交通大学学报》 EI CAS CSCD 北大核心 2012年第2期17-23,共7页 Journal of Xi'an Jiaotong University
基金 国家高技术研究发展计划资助项目(2009AA01A135 2009AA01Z108) 中央高校基本科研业务费专项资金资助项目(08142007)
关键词 CPU-GPU异构并行系统 全局优化 3级优化 3级剖分 CPU-GPU heterogeneous parallel system global optimization third-level optimization third-level profiling
作者简介 张保(1987-),男,硕士生; 董小社(通信作者),男,教授,博士生导师.
  • 相关文献

参考文献10

  • 1吴恩华.图形处理器用于通用计算的技术、现状及其挑战[J].软件学报,2004,15(10):1493-1504. 被引量:141
  • 2李治平.油气层渗流力学[J].西南石油学院学报,2000,.
  • 3Da Motta E P,Hill A D. Selective matrix acidizing of horizontal wells. SPE27399,1993
  • 4陈武,张明泉,曾高峰,张乃欣.气井时间利用分析[J].天然气工业,2003,23(3):47-48. 被引量:11
  • 5张保,曹海军,董小社,李丹,胡雷钧.面向图形处理器重叠通信与计算的数据划分方法[J].西安交通大学学报,2011,45(4):1-5. 被引量:5
  • 6YANG Yi, XIANG Ping, KONG Jingfei, et al. A GPGPU compiler for memory optimization and parallelism management[C]//Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation. New York, USA: ACM, 2010: 86-97.
  • 7MALONY A D, BIERSDORFF S, MAYANGLAMBAM S. An experimental approach to performance measurement of heterogeneous parallel applications using CUDA[C]//Proeeedings of the 24th ACM International Conference on Supercomputing. New York, USA: ACM, 2010; 127-136.
  • 8BAGHSORKHI S S, DELAHAYE M, PATEL S J, et al. An adaptive performance modeling tool for GPU architectures[C]//Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York, USA: ACM, 2010. 105 -114.
  • 9NVIDIA Corporation. NVIDIA CUDA Programming guide[EB/OL].[2010-07-15]. http://www. nvidia. com/obj ect/cuda_home_new. html.
  • 10董小社,冯国富,王旭昊,冯景华,胡雷钧.基于Cell多核处理器的层次化运行时支持技术[J].计算机研究与发展,2010,47(4):561-570. 被引量:2

二级参考文献30

共引文献155

同被引文献78

  • 1吴恩华.图形处理器用于通用计算的技术、现状及其挑战[J].软件学报,2004,15(10):1493-1504. 被引量:141
  • 2胡峰,胡保生.并行计算技术与并行算法综述[J].电脑与信息技术,1999,7(5):47-59. 被引量:10
  • 3熊超.基于GPU的连续波雷达频谱分析与谱峰搜索技术研究[D].长沙,国防科学技术大学,2011.
  • 4B汉.基于CPU + GPU的影像匹配高效能异构并行技术研究[D].武汉:武汉大学,2011:91 -92.
  • 5ZHU Xiaoqian,LIU Xin,MENG Xiangfei,et al.Performance analysis and optimization of gyrokinetic torodial code on TH-1A supercomputer[C]// Proceedings of 2nd International Conference on Electrical and Control Engineering.Piscataway,NJ,USA:IEEE,2011:6027-6031.
  • 6FENG Xiaowen,JIN Hai,ZHENG Ran,et al.Optimization of sparse matrix-vector multiplication with variant CSR on GPUs[C]//Proceedings of 17th IEEE International Conference on Parallel and Distributed Systems (ICPADS).Piscataway,NJ,USA:IEEE,2011:165-172.
  • 7WU Haicheng,DIAMOS G,Wang Jin,et al.Optimizing data warehousing applications for GPUs using kernel fusion/fission[C]// Proceedings of IEEE 26th International Parallel and Distributed Processing Symposium,Workshops & PhD Forum (IPDPSW).Piscataway,NJ,USA:IEEE,2011:2433-2442.
  • 8WOLF M E,LAM M S.A loop transformation theory and an algorithm to maximize parallelism[J].IEEE Trans on Parallel Distrib Syst,1991,2(4):452-471.
  • 9WOLF M E,LAM M S.A data locality optimizing algorithm[C]//Proceedings of the ACM SIGPLAN' 91Conference on Programming Language Design and Implementation (PLDI).Washington,DC,USA:ACM,1991:30-44.
  • 10SMITH M D,RAMSEY N,HOLLOWAY G H.A generalized algorithm for graph-coloring register allocation[C]// Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI).Washington,DC,USA:ACM,2004:277-288.

引证文献10

二级引证文献40

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部