Deep learning has achieved excellent results in various tasks in the field of computer vision,especially in fine-grained visual categorization.It aims to distinguish the subordinate categories of the label-level categ...Deep learning has achieved excellent results in various tasks in the field of computer vision,especially in fine-grained visual categorization.It aims to distinguish the subordinate categories of the label-level categories.Due to high intra-class variances and high inter-class similarity,the fine-grained visual categorization is extremely challenging.This paper first briefly introduces and analyzes the related public datasets.After that,some of the latest methods are reviewed.Based on the feature types,the feature processing methods,and the overall structure used in the model,we divide them into three types of methods:methods based on general convolutional neural network(CNN)and strong supervision of parts,methods based on single feature processing,and meth-ods based on multiple feature processing.Most methods of the first type have a relatively simple structure,which is the result of the initial research.The methods of the other two types include models that have special structures and training processes,which are helpful to obtain discriminative features.We conduct a specific analysis on several methods with high accuracy on pub-lic datasets.In addition,we support that the focus of the future research is to solve the demand of existing methods for the large amount of the data and the computing power.In terms of tech-nology,the extraction of the subtle feature information with the burgeoning vision transformer(ViT)network is also an important research direction.展开更多
In this paper,we propose hierarchical attention dual network(DNet)for fine-grained image classification.The DNet can randomly select pairs of inputs from the dataset and compare the differences between them through hi...In this paper,we propose hierarchical attention dual network(DNet)for fine-grained image classification.The DNet can randomly select pairs of inputs from the dataset and compare the differences between them through hierarchical attention feature learning,which are used simultaneously to remove noise and retain salient features.In the loss function,it considers the losses of difference in paired images according to the intra-variance and inter-variance.In addition,we also collect the disaster scene dataset from remote sensing images and apply the proposed method to disaster scene classification,which contains complex scenes and multiple types of disasters.Compared to other methods,experimental results show that the DNet with hierarchical attention is robust to different datasets and performs better.展开更多
基金supported by the National Natural Science Foundation of China(61571453,61806218).
文摘Deep learning has achieved excellent results in various tasks in the field of computer vision,especially in fine-grained visual categorization.It aims to distinguish the subordinate categories of the label-level categories.Due to high intra-class variances and high inter-class similarity,the fine-grained visual categorization is extremely challenging.This paper first briefly introduces and analyzes the related public datasets.After that,some of the latest methods are reviewed.Based on the feature types,the feature processing methods,and the overall structure used in the model,we divide them into three types of methods:methods based on general convolutional neural network(CNN)and strong supervision of parts,methods based on single feature processing,and meth-ods based on multiple feature processing.Most methods of the first type have a relatively simple structure,which is the result of the initial research.The methods of the other two types include models that have special structures and training processes,which are helpful to obtain discriminative features.We conduct a specific analysis on several methods with high accuracy on pub-lic datasets.In addition,we support that the focus of the future research is to solve the demand of existing methods for the large amount of the data and the computing power.In terms of tech-nology,the extraction of the subtle feature information with the burgeoning vision transformer(ViT)network is also an important research direction.
基金Supported by the National Natural Science Foundation of China(61601176)。
文摘In this paper,we propose hierarchical attention dual network(DNet)for fine-grained image classification.The DNet can randomly select pairs of inputs from the dataset and compare the differences between them through hierarchical attention feature learning,which are used simultaneously to remove noise and retain salient features.In the loss function,it considers the losses of difference in paired images according to the intra-variance and inter-variance.In addition,we also collect the disaster scene dataset from remote sensing images and apply the proposed method to disaster scene classification,which contains complex scenes and multiple types of disasters.Compared to other methods,experimental results show that the DNet with hierarchical attention is robust to different datasets and performs better.