摘要
针对卷积神经网络(CNN)的巨大参数量和计算量而导致在树莓派等低功耗的边缘设备模型推理过程中产生耗时较大的问题,对网络上现有的开源推理框架进行了深入研究及对比分析,发现这些都属于通用型推理框架,并不能针对树莓派设备进行极致推理优化。因此,提出了基于RoofLine模型的定量分析方法,从访存和运算二个维度对Mobilenet等移动端网络架构模型进行卷积推理优化。研究采用了计算图优化方法,利用算子融合和内存重排做推理预处理,从而减少推理过程的计算量和访存开销;同时针对每一层的卷积参数量和特性,提出了9宫格分块策略和NEON指令流水线级别的优化。实验表明,所提出的优化方法在不同的分辨率下,相比腾讯的开源框架NCNN、阿里MNN和商汤PPL.NN在推理速度上取得了高于3倍的性能优化。
In response to the problem of time-consuming in reasoning process of low-power edge devices such as Raspberry Pi due to the huge number of parameters and calculation amount of convolutional neural network(CNN),an in-depth study and comparative analysis of the existing open source reasoning framework on the network found that these are general reasoning frameworks,which cannot be optimized for the ultimate reasoning of Raspberry PI devices.Therefore,we propose a quantitative analysis method based on the RoofLine model to optimize the convolutional reasoning of mobile terminal network architecture models such as Mobilenet from two dimensions of memory access and operation.Firstly,by using the computational graph optimization method,operator fusion and memory arrangement as inference preprocessing,the amount of computation and memory access overhead in the inference process are reduced.Secondly,according to the CNN parameters and characteristics of each layer,the 9-grid block strategy and the optimization of NEON instruction pipeline are proposed.Experiments show that the proposed method achieves more than three times performance optimization under different resolutions in inferencing speed compared with Tencent's open-source framework NCNN,Alibaba MNN and Sensetime PPL.NN.
作者
郭晓龙
牛晋宇
杜永萍
GUO Xiao-long;NIU Jin-yu;DU Yong-ping(School of Information Technology,Beijing University of Technology,Beijing 100124,China)
出处
《计算机技术与发展》
2023年第5期96-104,共9页
Computer Technology and Development
基金
国家重点研发计划(2018YFC1900804,2019YFC1906002)。
关键词
深度学习模型推理加速
计算图优化
算子融合
卷积优化
移动端推理框架
deep learning model inference acceleration
computational graph optimization
operator fusion
convolution optimization
mobile inference framework
作者简介
郭晓龙(1983-),男,硕士研究生,研究方向为深度学习模型推理加速;通讯作者:杜永萍(1977-),女,博士,教授,研究方向为模式识别与智能信息处理。