期刊文献+

基于树莓派的高效卷积优化方法

Optimization Method of Efficient Convolution Based on Raspberry Pi
在线阅读 下载PDF
导出
摘要 针对卷积神经网络(CNN)的巨大参数量和计算量而导致在树莓派等低功耗的边缘设备模型推理过程中产生耗时较大的问题,对网络上现有的开源推理框架进行了深入研究及对比分析,发现这些都属于通用型推理框架,并不能针对树莓派设备进行极致推理优化。因此,提出了基于RoofLine模型的定量分析方法,从访存和运算二个维度对Mobilenet等移动端网络架构模型进行卷积推理优化。研究采用了计算图优化方法,利用算子融合和内存重排做推理预处理,从而减少推理过程的计算量和访存开销;同时针对每一层的卷积参数量和特性,提出了9宫格分块策略和NEON指令流水线级别的优化。实验表明,所提出的优化方法在不同的分辨率下,相比腾讯的开源框架NCNN、阿里MNN和商汤PPL.NN在推理速度上取得了高于3倍的性能优化。 In response to the problem of time-consuming in reasoning process of low-power edge devices such as Raspberry Pi due to the huge number of parameters and calculation amount of convolutional neural network(CNN),an in-depth study and comparative analysis of the existing open source reasoning framework on the network found that these are general reasoning frameworks,which cannot be optimized for the ultimate reasoning of Raspberry PI devices.Therefore,we propose a quantitative analysis method based on the RoofLine model to optimize the convolutional reasoning of mobile terminal network architecture models such as Mobilenet from two dimensions of memory access and operation.Firstly,by using the computational graph optimization method,operator fusion and memory arrangement as inference preprocessing,the amount of computation and memory access overhead in the inference process are reduced.Secondly,according to the CNN parameters and characteristics of each layer,the 9-grid block strategy and the optimization of NEON instruction pipeline are proposed.Experiments show that the proposed method achieves more than three times performance optimization under different resolutions in inferencing speed compared with Tencent's open-source framework NCNN,Alibaba MNN and Sensetime PPL.NN.
作者 郭晓龙 牛晋宇 杜永萍 GUO Xiao-long;NIU Jin-yu;DU Yong-ping(School of Information Technology,Beijing University of Technology,Beijing 100124,China)
出处 《计算机技术与发展》 2023年第5期96-104,共9页 Computer Technology and Development
基金 国家重点研发计划(2018YFC1900804,2019YFC1906002)。
关键词 深度学习模型推理加速 计算图优化 算子融合 卷积优化 移动端推理框架 deep learning model inference acceleration computational graph optimization operator fusion convolution optimization mobile inference framework
作者简介 郭晓龙(1983-),男,硕士研究生,研究方向为深度学习模型推理加速;通讯作者:杜永萍(1977-),女,博士,教授,研究方向为模式识别与智能信息处理。
  • 相关文献

参考文献6

二级参考文献32

  • 1金丽,包志华,陈海进.基于ARM嵌入式系统的C程序优化设计方法[J].南通大学学报(自然科学版),2006,5(3):61-64. 被引量:8
  • 2Rau B R. Iterative modulo scheduling: an algorithm for software pipelining loops[C]. Proceedings of 27th international symposium on Micro-architecture, ACM Press, 1994, 63-74.
  • 3Mahlke S ,Lin D,Chen W, Hank R,Bringmann R. Effective compiler support for predicated execution using the hyperblock[C]. Proceedings of 25th International Symposium on Microarchitecture, IEEE Computer Society, 1992, 45~54.
  • 4Chen Ding. Master's thesis: Improving software pipelining with unroll-and-jam and memory reuse analysis[M]. Michigan: Michigan Technological University, 1996.
  • 5Trimaran Consortium. TRIMARAN: An Infrastructure for Research in Instruction Level Parallelism[M]. http://www.trimaran.org.
  • 6Kathail V, Schlansker M, Ra R. Rau. Hpl-pd architecture specification: Version 1.1. Technical Report HPL-9380[M].Palo Alto: Hewlett Packard Laboratories, 2000.
  • 7P. van der Mark et al. Using Iterative Compilation for Managing Software Pipeline-Unrolling Trade-offs[C]. Proceedings of the 4th workshop on Software and Compilers for Embedded Systems, 1999.
  • 8顾乃杰,李凯,陈国良,吴超.基于龙芯2F体系结构的BLAS库优化[J].中国科学技术大学学报,2008,38(7):854-859. 被引量:13
  • 9田翔,周凡,陈耀武,刘莉,陈耀.基于FPGA的实时双精度浮点矩阵乘法器设计[J].浙江大学学报(工学版),2008,42(9):1611-1615. 被引量:21
  • 10谢川,贺玲玲.基于ARM处理器的软件优化设计[J].微计算机信息,2009,25(11):164-166. 被引量:2

共引文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部