Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to co...Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer to their domains (e.g., physics and biology) rather than MPI and OpenME This has led the development of domain-specific programming that provides domain-specific programming interfaces but abstracts away some performance-critical architecture details. Based on experience in designing large-scale computing systems, a hybrid programming framework for scientific computing on heterogeneous architectures is proposed in this work. Its design philosophy is to provide a collaborative mechanism for domain experts and computer experts so that both domain-specific knowledge and performance-critical architecture details can be adequately exploited. Two real-world scientific applications have been evaluated on TH-IA, a peta-scale CPU-GPU heterogeneous system that is currently the 5th fastest supercomputer in the world. The experimental results show that the proposed framework is well suited for developing large-scale scientific computing applications on peta-scale heterogeneous CPU/GPU systems.展开更多
In this paper they deal with the issue of specification and design of parallel communicatingprocesses. A trace-state based model is introduced to describe the behaviour of concurrent programs. They presenta formal sys...In this paper they deal with the issue of specification and design of parallel communicatingprocesses. A trace-state based model is introduced to describe the behaviour of concurrent programs. They presenta formal system based on that model to achieve hierarchical and modular development and verification methods. Anumber of refinement rules are used to decompose the specification into smaller ones and calculate program fromthe展开更多
为研究异构多核片上系统(multi-processor system on chip,MPSoC)在密集并行计算任务中的潜力,文章设计并实现了一种适用于粗粒度数据特征、面向任务级并行应用的异构多核系统动态调度协处理器,采用了片上缓存、任务输出的多级写回管理...为研究异构多核片上系统(multi-processor system on chip,MPSoC)在密集并行计算任务中的潜力,文章设计并实现了一种适用于粗粒度数据特征、面向任务级并行应用的异构多核系统动态调度协处理器,采用了片上缓存、任务输出的多级写回管理、任务自动映射、通讯任务乱序执行等机制。实验结果表明,该动态调度协处理器不仅能够实现任务级乱序执行等基本设计目标,还具有极低的调度开销,相较于基于动态记分牌算法的调度器,运行多个子孔径距离压缩算法的时间降低达17.13%。研究结果证明文章设计的动态调度协处理器能够有效优化目标场景下的任务调度效果。展开更多
Gamma is a kernel programming language with an elegant chemical reaction metaphor in whichprograms are described in terms of multiset rewriting. Gamma formalism allows one to describe analgorithm without introducing a...Gamma is a kernel programming language with an elegant chemical reaction metaphor in whichprograms are described in terms of multiset rewriting. Gamma formalism allows one to describe analgorithm without introducing artificial sequentiality and leads to the derivation of a parallel solution to agiven problem naturally. However, the difficulty of incorporating control strategies makes Gamma not onlyhard for one to define any sophisticated approaches but also impossible to reach a decent level of efficiencyin any direct implementation. Recently, a higherorder multiset programming paradigm, named higher--order Gamma, is introduced by Metayer to alleviate these problems. In this paper, we investigate the possibility of implementing higherorder Gamma on Maspar, a massively data parallel computer. The results showthat a program written in higher--order Gamma can be transformed naturally toward an efficientimplementation on a real parallel machine.展开更多
针对宇称时间(parity-time,PT)对称多线圈并联无线电能传输(wireless power transfer,WPT)系统参数相互关联,系统参数配置困难,提出一种基于PT对称的多线圈并联WPT系统参数优化设计方法。建立基于PT对称多线圈并联WPT系统数学模型,给出...针对宇称时间(parity-time,PT)对称多线圈并联无线电能传输(wireless power transfer,WPT)系统参数相互关联,系统参数配置困难,提出一种基于PT对称的多线圈并联WPT系统参数优化设计方法。建立基于PT对称多线圈并联WPT系统数学模型,给出系统输出功率、传输效率以及有效传输距离的一般性影响规律表达式。采用一种非线性规划和遗传算法相结合的优化方法,以输出功率为目标函数,以传输效率和有效传输距离为约束条件,对系统发射-接受线圈数及谐振参数进行寻优。根据优化结果搭建实验样机,实验结果表明系统在满足所需传输效率及有效传输距离下,系统功率输出达到设计要求,验证了理论分析和优化方法的有效性。展开更多
基金Project(61170049) supported by the National Natural Science Foundation of ChinaProject(2012AA010903) supported by the National High Technology Research and Development Program of China
文摘Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer to their domains (e.g., physics and biology) rather than MPI and OpenME This has led the development of domain-specific programming that provides domain-specific programming interfaces but abstracts away some performance-critical architecture details. Based on experience in designing large-scale computing systems, a hybrid programming framework for scientific computing on heterogeneous architectures is proposed in this work. Its design philosophy is to provide a collaborative mechanism for domain experts and computer experts so that both domain-specific knowledge and performance-critical architecture details can be adequately exploited. Two real-world scientific applications have been evaluated on TH-IA, a peta-scale CPU-GPU heterogeneous system that is currently the 5th fastest supercomputer in the world. The experimental results show that the proposed framework is well suited for developing large-scale scientific computing applications on peta-scale heterogeneous CPU/GPU systems.
基金ESPRIT Basic Research ProCoS project 3104 and 7071
文摘In this paper they deal with the issue of specification and design of parallel communicatingprocesses. A trace-state based model is introduced to describe the behaviour of concurrent programs. They presenta formal system based on that model to achieve hierarchical and modular development and verification methods. Anumber of refinement rules are used to decompose the specification into smaller ones and calculate program fromthe
文摘为研究异构多核片上系统(multi-processor system on chip,MPSoC)在密集并行计算任务中的潜力,文章设计并实现了一种适用于粗粒度数据特征、面向任务级并行应用的异构多核系统动态调度协处理器,采用了片上缓存、任务输出的多级写回管理、任务自动映射、通讯任务乱序执行等机制。实验结果表明,该动态调度协处理器不仅能够实现任务级乱序执行等基本设计目标,还具有极低的调度开销,相较于基于动态记分牌算法的调度器,运行多个子孔径距离压缩算法的时间降低达17.13%。研究结果证明文章设计的动态调度协处理器能够有效优化目标场景下的任务调度效果。
文摘Gamma is a kernel programming language with an elegant chemical reaction metaphor in whichprograms are described in terms of multiset rewriting. Gamma formalism allows one to describe analgorithm without introducing artificial sequentiality and leads to the derivation of a parallel solution to agiven problem naturally. However, the difficulty of incorporating control strategies makes Gamma not onlyhard for one to define any sophisticated approaches but also impossible to reach a decent level of efficiencyin any direct implementation. Recently, a higherorder multiset programming paradigm, named higher--order Gamma, is introduced by Metayer to alleviate these problems. In this paper, we investigate the possibility of implementing higherorder Gamma on Maspar, a massively data parallel computer. The results showthat a program written in higher--order Gamma can be transformed naturally toward an efficientimplementation on a real parallel machine.
文摘针对宇称时间(parity-time,PT)对称多线圈并联无线电能传输(wireless power transfer,WPT)系统参数相互关联,系统参数配置困难,提出一种基于PT对称的多线圈并联WPT系统参数优化设计方法。建立基于PT对称多线圈并联WPT系统数学模型,给出系统输出功率、传输效率以及有效传输距离的一般性影响规律表达式。采用一种非线性规划和遗传算法相结合的优化方法,以输出功率为目标函数,以传输效率和有效传输距离为约束条件,对系统发射-接受线圈数及谐振参数进行寻优。根据优化结果搭建实验样机,实验结果表明系统在满足所需传输效率及有效传输距离下,系统功率输出达到设计要求,验证了理论分析和优化方法的有效性。