In the era of modern high performance computing, GPUs have been considered an excellent accelerator for general purpose data-intensive parallel applications. To achieve application speedup from GPUs, many of performan...In the era of modern high performance computing, GPUs have been considered an excellent accelerator for general purpose data-intensive parallel applications. To achieve application speedup from GPUs, many of performance-oriented optimization techniques have been proposed. However, in order to satisfy the recent trend of power and energy consumptions, power/energy-aware optimization of GPUs needs to be investigated with detailed analysis in addition to the performance-oriented optimization. In this work, in order to explore the impact of various optimization strategies on GPU performance, power and energy consumptions, we evaluate performance and power/energy consumption of a well-known application running on different commercial GPU devices with the different optimization strategies. In particular, in order to see the more generalized performance and power consumption patterns of GPU based accelerations, our evaluations are performed with three different Nvdia GPU generations(Fermi, Kepler and Maxwell architectures), various core clock frequencies and memory clock frequencies. We analyze how a GPU kernel execution is affected by optimization and what GPU architectural factors have much impact on its performance and power/energy consumption. This paper also categorizes which optimization technique primarily improves which metric(i.e., performance, power or energy efficiency). Furthermore, voltage frequency scaling(VFS) is also applied to examine the effect of changing a clock frequency on these metrics. In general, our work shows that effective GPU optimization strategies can improve the application performance significantly without increasing power and energy consumption.展开更多
外部函数接口(FFI)是解决一种编程语言调用其他语言函数库的主要方法。针对使用FFI技术时需要大量人工编码的问题,提出自动化外部函数接口生成(AFIG)方法。该方法利用基于抽象语法树的源码逆向分析技术,从被封装的库文件中精准提取出用...外部函数接口(FFI)是解决一种编程语言调用其他语言函数库的主要方法。针对使用FFI技术时需要大量人工编码的问题,提出自动化外部函数接口生成(AFIG)方法。该方法利用基于抽象语法树的源码逆向分析技术,从被封装的库文件中精准提取出用于描述函数接口信息的多语言融合的统一表示。基于此统一表示,不同平台的代码生成器可利用多语言转换规则矩阵,全自动化地生成不同平台的FFI相关代码。为解决FFI代码生成中的效率低下问题,设计了一种基于依赖分析的任务聚合策略,通过把存在依赖的任务聚合为新的任务,有效消除了FFI代码任务在并行下的阻塞与死锁,从而实现任务在多核系统下的可扩展与负载均衡。实验结果表明:与人工编码相比,AFIG方法减少了FFI开发中98.14%的开发编码量以及41.95%的测试编码量;与现有的SWIG(Simplified Wrapper and Interface Generator)方法相比,在同等任务下可减少61.27%的开发成本;且生成效率随着计算资源的增加呈线性增长。展开更多
基金supported by Basic Science Research Program through the National Research Foundation(2015R1D1A3A01019869),Korea
文摘In the era of modern high performance computing, GPUs have been considered an excellent accelerator for general purpose data-intensive parallel applications. To achieve application speedup from GPUs, many of performance-oriented optimization techniques have been proposed. However, in order to satisfy the recent trend of power and energy consumptions, power/energy-aware optimization of GPUs needs to be investigated with detailed analysis in addition to the performance-oriented optimization. In this work, in order to explore the impact of various optimization strategies on GPU performance, power and energy consumptions, we evaluate performance and power/energy consumption of a well-known application running on different commercial GPU devices with the different optimization strategies. In particular, in order to see the more generalized performance and power consumption patterns of GPU based accelerations, our evaluations are performed with three different Nvdia GPU generations(Fermi, Kepler and Maxwell architectures), various core clock frequencies and memory clock frequencies. We analyze how a GPU kernel execution is affected by optimization and what GPU architectural factors have much impact on its performance and power/energy consumption. This paper also categorizes which optimization technique primarily improves which metric(i.e., performance, power or energy efficiency). Furthermore, voltage frequency scaling(VFS) is also applied to examine the effect of changing a clock frequency on these metrics. In general, our work shows that effective GPU optimization strategies can improve the application performance significantly without increasing power and energy consumption.
文摘外部函数接口(FFI)是解决一种编程语言调用其他语言函数库的主要方法。针对使用FFI技术时需要大量人工编码的问题,提出自动化外部函数接口生成(AFIG)方法。该方法利用基于抽象语法树的源码逆向分析技术,从被封装的库文件中精准提取出用于描述函数接口信息的多语言融合的统一表示。基于此统一表示,不同平台的代码生成器可利用多语言转换规则矩阵,全自动化地生成不同平台的FFI相关代码。为解决FFI代码生成中的效率低下问题,设计了一种基于依赖分析的任务聚合策略,通过把存在依赖的任务聚合为新的任务,有效消除了FFI代码任务在并行下的阻塞与死锁,从而实现任务在多核系统下的可扩展与负载均衡。实验结果表明:与人工编码相比,AFIG方法减少了FFI开发中98.14%的开发编码量以及41.95%的测试编码量;与现有的SWIG(Simplified Wrapper and Interface Generator)方法相比,在同等任务下可减少61.27%的开发成本;且生成效率随着计算资源的增加呈线性增长。