摘要
场景识别在人机交互、内容检索、智能场景拍照等领域有着越来越强的应用需求。以往的识别方法大多采用多特征融合,比如对象特征、全局布局特征和上下文特征,来得到特征的多样性和互补性,但我们认为由于场景图像的拍摄距离、拍摄视角更具有多样性,导致场景识别相对于其他图像识别任务来说,对感受野的自适应性需求更强。传统的卷积神经网络每一层都是一个固定的感受野,导致感受野的变化不够灵活。在本文中,我们提出了一个多尺度感受野网络来改进网络的感受野,并加入了注意力机制来进一步提取更具有语义区分度的场景特征。最后,我们在三个标准的场景识别数据集上进行了充足的实验,结果表明我们提出的方法是有效的且具有良好的性能。
Scene recognition is ever more important in many computer vision applications such as human-computer interaction,content retrieval,and intelligent scene matching photography.Previous works mostly employ multi-feature fusion,such as object features,global layout information,and context features,to achieve the diversity and complementary of features.We consider that compared with other image recognition tasks,due to the diversity of shooting distance and shooting view,multi-scale receptive field is more helpful to enhance feature discriminative power for scene recognition.However,the traditional convolutional neural network has a fixed receptive field of each layer,the change of receptive field is not flexible enough.In this paper,we propose a Multi-Scale Receptive Field Network to effectively improve the receptive field in the network,and employ the attention mechanism to capture the discriminative regions and semantic features for scene recognition.Extensive experimental evaluation on three standard benchmarks demonstrates the efficacy of our approach compared to other state-of-the-art methods.
作者
张能欢
王永滨
ZHANG Neng-huan;WANG Yong-bin(Collaborative Innovation Center,Communication University of China,Beijing 100024,China;School of Computer Science,Communication University of China,Beijing 100024,China)
出处
《中国传媒大学学报(自然科学版)》
2020年第5期9-15,共7页
Journal of Communication University of China:Science and Technology
基金
国家重点研发计划“融媒体技术支撑及服务模式研究”(2019YFB1406201)
关键词
场景识别
感受野
多尺度
注意力机制
scene recognition
receptive field
multi-scale
attention mechanism
作者简介
张能欢(1990-),女(汉族),安徽六安人,中国传媒大学博士研究生.E-mail:nhzhang@cuc.edu.cn