摘要
近年来,视觉Transformer(Vision Transformer,ViT)在图像识别领域取得了突破性进展,其自注意力机制能够从图像中提取出不同像素块的判别性标记信息,进而提升图像分类的精度。在图像分类领域中,细粒度图像分类具有类与类之间的特征差距小、类内的特征差距大的特点,从而导致了分类困难。针对细粒度图像分类中数据分布具有小型、非均匀和难以发现类与类之间的差异等特征,提出一种基于双层路由注意力(Bi-level Routing Attention,BRA)的细粒度图像分类模型。基准骨干网络采用多阶段层级架构设计的新型视觉Transformer模型作为视觉特征提取器,从中获得局部信息和全局信息以及多尺度的特征。同时引入特征增强、融合模块,以此提高网络对关键特征的学习能力。实验结果表明,该模型在CUB-200-2011和Stanford Dogs这两个细粒度图像数据集上的分类精度分别达到了91.7%和92.2%,相较于多个主流细粒度图像分类模型,该模型具有更好的分类结果。
In recent years,Vision Transformer(ViT)has made a breakthrough in the field of image recognition.Its self-attention module can extract discriminative labeling information of different pixel blocks from images,thereby improving the accuracy of image classification.In the field of image classification,fine-grained image classification is difficult to classify due to the characteristics of small feature differences between classes and large feature differences within classes.A fine-grained image classification model based on Bi-level Routing Attention(BRA)is proposed to address the characteristics of small,non-uniform,and imperceptible differences between classes in data distribution in fine-grained image classification.The benchmark backbone network adopts a new visual Transformer model designed with a multi-stage hierarchical architecture as the visual feature extractor,which obtains local and global information as well as multi-scale features.At the same time,feature boosting and fusion modules are introduced to improve the network's learning ability for key features.The experimental results show that the classification accuracy of such model on two fine-grained image datasets,CUB-200-2011 and Stanford Dogs,reaches 91.7%and 92.2%.Compared with multiple mainstream fine-grained image classification models,such model has better classification results.
作者
沈宇麒
崔衍
SHEN Yu-qi;CUI Yan(School of Internet of Things,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
出处
《计算机技术与发展》
2024年第6期23-28,共6页
Computer Technology and Development
基金
中国国家博士后基金(2020M671554)。
作者简介
沈宇麒(1999-),男,硕士研究生,研究方向为人工智能、计算机视觉;通讯作者:崔衍(1982-),女,博士,副教授,研究方向为模式识别、生物信息学等。