摘要
针对视频中的完整行为建模,目前常用的方法为时间分段网络(Temporal Segment Network,TSN),但TSN不能充分获取行为的变化信息。为了在时间维度上充分发掘行为的变化信息,文中提出了行为关联网络Action-Related Network(ARN),首先使用BN-Inception网络提取视频中行为的特征,然后将提取到的视频分段特征与Long Short-Term Memory(LSTM)模块输出的特征拼接,最后进行分类。通过以上方法,ARN可以兼顾行为的静态信息和动态信息。实验结果表明,在通用数据集HMDB-51上,ARN的识别准确率为73.33%,比TSN提高了7%;当增加行为信息时,ARN的识别准确率将比TSN提高10%以上。而在行为变化较多的数据集Something-Something V1上,ARN的识别准确率为28.12%,比TSN提高了51%。最后在HMDB-51数据集的一些行为类别上,文中进一步分析了ARN和TSN分别利用更完整的行为信息时识别准确率的变化情况,结果表明ARN的单个类别识别准确率高于TSN 10个百分点以上。由此可见,ARN通过关联行为变化,对完整行为信息进行了更充分的利用,从而有效地提高了变化行为的识别准确率。
When modeling the complete action in the video,the commonly used method is the temporal segment network(TSN),but TSN cannot fully obtain the action change information.In order to fully explore the change information of action in the time dimension,the Action-Related Network(ARN)is proposed.Firstly,the BN-Inception network is used to extract the features of the action in the video,and then the extracted video segmentation features are combined with the features output by the Long Short-Term Memory(LSTM),and finally classified.With the above approach,ARN can take into account both static and dyna-mic information about the action.Experiments show that on the general data set HMDB-51,the recognition accuracy of ARN is 73.33%,which is 7%higher than the accuracy of TSN.When the action information is increased,the recognition accuracy of ARN will be 10%higher than TSN.On the Something-Something V1 data set with more action changes,the recognition accuracy of ARN is 28.12%,which is 51%higher than the accuracy of TSN.Finally,in some action categories of HMDB-51 dataset,this paper further analyzes the changes of the recognition accuracy of ARN and TSN when using more complete action information res-pectively.The recognition accuracy of ARN is higher than TSN by 10 percentage points.It can be seen that ARN makes full use of the complete action information through the change of the associated action,thereby effectively improving the recognition accuracy of the change action.
作者
何鑫
许娟
金莹莹
HE Xin;XU Juan;JIN Ying-ying(College of Computer Since and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211100,China;Key Laboratory of Computer Network and Information Integration,Ministry of Education,Southeast University,Nanjing 210096,China)
出处
《计算机科学》
CSCD
北大核心
2020年第9期123-128,共6页
Computer Science
关键词
行为识别
行为关联网络
深度学习
计算机视觉
Action recognition
Action-related network
Deep learning
Computer vision
作者简介
何鑫,born in 1995,postgraduate,is a member of China Computer Federation.His main research interests include deep learning and action recognition,hexin@nuaa.edu.cn;通信作者:许娟,born in 1981,associate professor,is a member of China Computer Fe-deration.Her main interests include quantum computing and quantum information,cloud computing and deep lear-ning.(juanxu@nuaa.edu.cn)。