期刊文献+

基于注意力网络推理图的细粒度图像分类 被引量:2

Fine-Grained Image Classification Based on Inference Graph of Attention Network
在线阅读 下载PDF
导出
摘要 针对场景图像的细粒度分类任务,结合图像视觉和文本的多模态信息提出了一种基于注意力网络推理图的细粒度图像分类方法。首先提取场景图像的全局视觉特征、局部视觉特征和文本特征,把位置信息分别嵌入局部视觉特征和文本特征后拼接成新的特征,再将这个新的特征作为图结构的节点生成一个异构图;然后设计两条元路径将异构图分解成两个同构图,并将其分别放入设计有节点级注意和语义级注意的两级注意力网络推理图;最后将输出的节点特征与全局视觉特征进行多模态融合操作,获得更丰富的细粒度特征表达。所提出的模型实现了多模态融合与图注意力网络的有效结合,且在Con-Text和Drink Bottle两个场景文本细粒度图像数据集上与目前主流先进方法相比具有较强的竞争力。 Aiming at the task of fine-grained classification of scene images,this paper proposes a fine-grained image classification method based on the attention network inference graph by integrating the multimodal information of image visual and textual features.First,we extract the global visual feature,local visual features and text features of the scene image,and form a new splicing feature by embedding the position information into the local visual features and textual features respectively.The feature is accordingly used as a node of the graph structure to generate a heterogeneous graph.Then,we design two meta-paths to decompose the heterogeneous graph into two isomorphic graphs,and put them into a two-level attention network inference graph with node-level attention and semantic-level attention.Finally,richer fine-grained feature expression can be obtained by multimodal fusion operations with the output node features and global visual feature.The proposed model enables effective combination of multimodal fusion and graph attention network,and performs strong competitiveness comparing with the current advanced mainstream methods on the two scene text fine-grained image datasets of Con-Text and Drink Bottle.
作者 郑智文 甘健侯 周菊香 欧阳昭相 鹿泽光 ZHENG Zhiwen;GAN Jianhou;ZHOU Juxiang;OUYANG Zhaoxiang;LU Zeguang(Key Laboratory of Education Informatization for Nationalities,Ministry of Education,Yunnan Normal University,Kunming 650500,Yunnan,China;Yunnan Key Laboratory of Smart Education,Yunnan Normal University,Kunming 650500,Yunnan,China;School of Information,Dehong Teacher’s College,Dehong 678400,Yunnan,China;National Academy of Guoding Institute of Data Science,Beijing 100010,China)
出处 《应用科学学报》 CAS CSCD 北大核心 2022年第1期36-46,共11页 Journal of Applied Sciences
基金 国家自然科学基金(No.62166050)资助
关键词 场景图像 多模态 图注意力网络 节点级注意力 语义级注意力 scene image multimodal graph attention network node-level attention semantic-level attention
作者简介 通信作者:周菊香,博士生,副研究员,研究方向为计算机视觉、机器学习。E-mail:zjuxiang@ynnu.edu.cn
  • 相关文献

参考文献5

二级参考文献33

  • 1韩冰,高新波,姬红兵.一种基于选择性集成SVM的新闻音频自动分类方法[J].模式识别与人工智能,2006,19(5):634-639. 被引量:5
  • 2魏维,李千目,刘凤玉,许满武.视频语义分析两级多模式融合算法[J].中国图象图形学报,2007,12(5):893-898. 被引量:1
  • 3ZHU Yingying,ZHOU Dongru.Video browsing and retrieval based on multimodal integration[C]//Proceedings of IEEE/WIC International Conference on Web Intelligence.2003:650-653.
  • 4GANONG W F.Review of medical physiology[M].New York:McGraw-Hill publishing Company,1999.
  • 5BEYER K,GOLDSTEIN J,R.AMAKRISHNAN R,SHAFT U.When is'nearest neighbor'meaningful?[C]//Proceedings of International Conference on Database Theory.1998:217-235.
  • 6WU Yi,CHANG E Y,CHANG K C C,SMITH J R.Optimal multimodal fusion for multimedia data analysis[C]//Proceedings of 12th ACM International Conference on Multimedia.2004:572-579.
  • 7ZHANG Shile,FAN Jianping,Lu Hong,XUE Xiangyang.Salient object detection on large-scale video data[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2007:1-6.
  • 8QI Guojun,HUA Xiansheng,RUI Yong,ZHANG Hongjiang.Correlative multi-label video annotation[J].Proceedings of the 15th International Conference on Multimedia.2007:17-26.
  • 9BROWN P.word sense disambigalation using tactical methods[C]//Proceedings of the 29th Meeting of the Association for Computational Linguistics(ACL-91),1991.
  • 10郎丛妍,须德,李兵.一种基于模糊信息粒化的视频时空显著单元提取方法[J].电子学报,2007,35(10):2023-2028. 被引量:3

共引文献36

同被引文献25

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部