摘要
针对现有的动作识别方法缺乏从局部和全局综合的角度来考虑时间上下文的问题,提出了一种基于时间上下文模块的动作识别方法。对中间特征图进行通道分割,并引入一个超参数来限制时间建模的通道占比,从而减少模型的计算成本和增强特征重用。利用局部分支和时间分支分别从视频中提取短期和长期的时间上下文信息来提高模型的时间推理能力。时间上下文模块能够以一种即插即用的方式灵活地集成到现有的任何2D卷积神经网络(CNNs)中,以极小的额外计算成本产生一个紧凑的时间上下文网络。通过在ImageNet上进行预训练,在Something Something V1和Diving-48数据集上取得了48.1%和40.5%的Top-1准确率,广泛的消融实验和对比试验结果表明,方法在准确率以及计算成本上实现了更佳的权衡。
Aiming at the existing action recognition methods lack comprehensive consideration of temporal context from both local and global perspectives, we propose a action recognition method based on temporal context module. First, channel split is performed on the intermediate feature map, and a hyperparameter is introduced to limit the channel ratio for temporal modeling, thus reducing the computational cost of the model and enhancing feature reuse. Then, the local branch and temporal branch are utilized to extract short-term and long-term temporal context information from the video, respectively, to improve the temporal reasoning ability of the model. The temporal context module can be flexibly integrated into any existing 2 D CNNs in a plug-and-play manner, thereby yield a compact spatiotemporal context network with minimal additional computational cost. By pre-training on ImageNet, our method achieves 48.1% and 40.5% Top-1 accuracy on something something V1 and Diving-48 datasets, and extensive ablation experiments and comparative test results show that our method achieves a preferable balance between computation and accuracy.
作者
周璇
易剑平
Zhou Xuan;Yi Jianping(School of Mechanical and Electrical Engineering,Xi'an Traffic Engineering Institute,Xi'an 710300,China;School of Electronics and Information,Xi'an Polytechnic University,Xi'an 710600,China)
出处
《国外电子测量技术》
北大核心
2022年第10期72-79,共8页
Foreign Electronic Measurement Technology
基金
西安交通工程学院中青年基金项目(2022KY-02)资助。
关键词
动作识别
时间上下文
时间推理
即插即用
action recognition
temporal context
temporal reasoning
plug-in-play
作者简介
周璇,硕士,助教,主要研究方向为人体动作识别、人体检测、模式识别与图像处理等。E-mail:1138845898@qq.com;易剑平,硕士研究生,主要研究方向为视频理解、图像处理、动作识别等。