期刊文献+

基于Cell多核处理器的层次化运行时支持技术 被引量:2

Research on Multilayer Runtime Library Technology for Cell/B.E. Processor
在线阅读 下载PDF
导出
摘要 基于Cell处理器的异构多核架构及软件显式管理的多级存储层次,使其面临编程困难和性能难以有效发挥等问题.现有基于Cell/B.E.的编程模型多侧重于支持类似于流处理的"批量访存"(bulk data transfer)应用,传统非规则访存应用性能较低.通过扩展Cell/B.E.访存库增强协处理单元的自主作用,以协处理单元为中心建立Cell计算平台上的MPI和弱一致性Pthread分层并行编程运行时支持.分层的运行时支持结构及扩展后的Cell/B.E.访存库使模型具有更好的效率和可扩展性,并且提高了非规则应用的性能;模型中的MPI方便了大量传统并行应用向新架构的移植及开发,而弱一致性Pthread则为MPI提供高效的任务运行时管理支持及为系统级用户提供对架构全面控制的编程接口.实验结果表明,提出的运行时支持技术不仅可适应不同应用的要求,同时借助访存库中的剖分优化机制可有效地挖掘Cell/B.E.架构性能. The heterogeneous multi-core architecture and explicit management for multi-level memory hierarchy of cell processor pose programming & performance challenges to both programmers and applications. Most programming models for Cell/B. E. based system well support bulk data transfer applications which are suitable for streaming processing, but suffer performance degradation for the applications whose memory access patterns are irregular or unpredictable. In this paper, by extending memory access library and enhancing the co-processor's independency of Cell/B. E. processor, a co- processor centric multilayer runtime library which supports both MPI and release consistency-based Pthread programming model is proposed. The multilayer structure of the model and the flexible extended memory access library not only make the model more efficient and scalable but also boost the performance of irregular applications. In the model, while MPI programming interface enables large existing MPI applications to be ported to the Cell/B. E. processor easily and facilitates the traditional parallel programming, the release-consistency-based Pthread programming interface offers an efficient task runtime library to both MPI and the system-level users who need full control over the architecture. The experimental results show that the proposed multilayer runtime library is suitable for various applications and can achieve better performance by using a profile based optimizing technology built in the memory access library.
出处 《计算机研究与发展》 EI CSCD 北大核心 2010年第4期561-570,共10页 Journal of Computer Research and Development
基金 国家"八六三"高技术研究发展计划基金项目(2009AA01Z108 2006AA01A109)~~
关键词 协处理单元为中心 运行时支持库 异构多核 Cell/B.E. 分层结构 co-processor centric runtime library heterogeneous multi-core system Cell/B. E. hierarchical architecture
作者简介 (jt_f@163.com)董小社,1963年生,博士,教授,博士生导师,中国计算机学会会员,主要研究方向为高性能计算机体系结构与网格计算. 冯国富,1971年生,系统分析师,主要研究方向为并行计算机系统结构、并行编程环境与操作系统. 王旭昊,1984年生,硕士研究生,主要研究方向为并行计算机系统结构与操作系统. 冯景华,1984年生,硕士研究生,主要研究方向为并行编程与并行计算机系统结构. 胡雷钧,1971年生,高级工程师,主要研究方向为高性能计算机体系结构与网格计算.
  • 相关文献

参考文献20

  • 1胡伟武,唐志敏.龙芯1号处理器结构设计[J].计算机学报,2003,26(4):385-396. 被引量:53
  • 2胡伟武,赵继业,钟石强,杨旭,Elio Guidetti,吴永强.Implementing a 1GHz Four-Issue Out-of-Order Execution Microprocessor in a Standard Cell ASIC Methodology[J].Journal of Computer Science & Technology,2007,22(1):1-14. 被引量:14
  • 3Kahle,et al.The cell processor architecture[C]//The 38th Annual IEEE/ACM Int Symp on Micro-Architecture(MICRO-38).New York:ACM,2005:3-3.
  • 4Gschwind M.Chip multiprocessing and the cell broadband engine[C]//Proc of ACM Computing Frontiers 2006.New York:ACM,2006:1-8.
  • 5Williams S,Shalf J,Oliker L,et al.The potential of the cell processor for scientific computing[C]//Proc of the 3rd Conf on Computing Frontiers.New York:ACM,2006:9-20.
  • 6Knight T J,Park J Y,Ren M,et al.Compilation for explicitly managed memory hierarchies[C]//Proc of the 12th ACM SIGPLAN Symp on Principles and Practice of Parallel Programming (PPoPP'07).New York:ACM,2007:226-236.
  • 7Houston Mike,Park Ji Young,Ren Manman,et al.A portable runtime interface for multi-level memory hierarchies[C]//Proc of the 13th ACM SIGPLAN Symp on Principles and Practice of Parallel Programming (PPOPP 2008).New York:ACM,2008:143-152.
  • 8Ohara M,Inoue H,Sohda Y,et al.MPI microtask for programming the cell broadband engine processor[J]IBM Systems Journal,2006,45(1):85-102.
  • 9Kunzman D,Zheng G,Bohm E,et al.Charm+ +simplifies coding for the cell processor[C]//Proc of the 2006 ACM/IEEE Conf on Supercomputing.New York:ACM,2006.
  • 10Kumar Arun,Jayam Naresh,Srinivasan Ashok,et al.Brief announcement:Feasibility study of MPI implementation on the heterogeneous multi-core cell BETM architecture[C]//Proc of the 19th Annual ACM Symp on Parallel Algorithms and Architectures.New York:ACM,2007:55-56.

二级参考文献31

  • 1Wei-WuHu Fu-XinZhang Zu-SongLi.Microarchitecture of the Godson-2 Processor[J].Journal of Computer Science & Technology,2005,20(2):243-249. 被引量:52
  • 2GSCHWlND M. Chip multiprocessing and the cell broadband engine [C]//Proceedings of ACM Compu- ting Frontiers. New York, USA: ACM Press, 2006: 1-8.
  • 3OHARA M,INOUE H,SOHDA Y, et al. MPI microtask for programming the cell broadband engine processor [J]. IBM Systems Journal, 2006, 45 (1) : 85- 102.
  • 4BELLENS P.PEREZ J M.BADIA R M,et al.CellSa:a programming model for the cell BE architecture[C]//Proceedings of the ACM/IEEE SC 2006 Conference on High Performance Networking and Computing.New York.USA:ACM Press,2006:86.
  • 5EICHENBERGER A E.O'BREN J R.O'BRIEN K M,et al.Using advanced complier technology to exploit the porformance of the cell broadband cngine[trademark]architecture[J].IBM Systems Journal,2006,45(1):59-84.
  • 6VAN DER SPOEI. D, LINDABI. E, HESS B, et al. Gromacs: fast, flexible, and free [J]. Journal of Computational Chemistry, 2005,26(16) : 1701-1718.
  • 7[1]Divid Patterson,John Hennessy. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, 1996
  • 8[2]Kessler R. The Alpha 21264 Microprocessor. IEEE Micro, 1999,19(2): 24~36
  • 9[3]Kenneth Yeager. The MIPS R10000 Superscalar Microprocessor. IEEE Micro, 1996,16(2): 28~41
  • 10[4]Tim Horel, Gary Lauterbach. UntraSparc-III: Designing Third-Generation 64-bit Performance. IEEE Micro, 1999,19(3): 73~85

共引文献83

同被引文献13

  • 1吴恩华.图形处理器用于通用计算的技术、现状及其挑战[J].软件学报,2004,15(10):1493-1504. 被引量:141
  • 2YANG Yi, XIANG Ping, KONG Jingfei, et al. A GPGPU compiler for memory optimization and parallelism management[C]//Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation. New York, USA: ACM, 2010: 86-97.
  • 3MALONY A D, BIERSDORFF S, MAYANGLAMBAM S. An experimental approach to performance measurement of heterogeneous parallel applications using CUDA[C]//Proeeedings of the 24th ACM International Conference on Supercomputing. New York, USA: ACM, 2010; 127-136.
  • 4BAGHSORKHI S S, DELAHAYE M, PATEL S J, et al. An adaptive performance modeling tool for GPU architectures[C]//Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York, USA: ACM, 2010. 105 -114.
  • 5NVIDIA Corporation. NVIDIA CUDA Programming guide[EB/OL].[2010-07-15]. http://www. nvidia. com/obj ect/cuda_home_new. html.
  • 6Da Motta E P,Hill A D. Selective matrix acidizing of horizontal wells. SPE27399,1993
  • 7李治平.油气层渗流力学[J].西南石油学院学报,2000,.
  • 8冯国富,魏恒义,储鹰,董小社.支持多种Linux版本的动态内核性能测试技术[J].西安交通大学学报,2008,42(6):674-678. 被引量:3
  • 9冯国富,董小社,胡冰,王旭昊,王恩东.一种支持多种访存技术的CBEA片上多核MPI并行编程模型[J].计算机学报,2008,31(11):1965-1974. 被引量:6
  • 10林闯,田源,姚敏.绿色网络和绿色评价:节能机制、模型和评价[J].计算机学报,2011,34(4):593-612. 被引量:150

引证文献2

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部