The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have ...The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have the following two shortcomings:On the one hand,they mostly use global average pooling to generate context descriptors,without highlighting the guiding role of salient information on descriptor generation,resulting in insufficient ability of the final generated attention mask representation;On the other hand,the design of most attention modules is complicated,which greatly increases the computational cost of the model.To solve these problems,this paper proposes an attention module called self-supervised recalibration(SR)block,which introduces both global and local information through adaptive weighted fusion to generate a more refined attention mask.In particular,a special"Squeeze-Excitation"(SE)unit is designed in the SR block to further process the generated intermediate masks,both for nonlinearizations of the features and for constraint of the resulting computation by controlling the number of channels.Furthermore,we combine the most commonly used Res Net-50 to construct the instantiation model of the SR block,and verify its effectiveness on multiple Re-ID datasets,especially the mean Average Precision(m AP)on the Occluded-Duke dataset exceeds the state-of-the-art(SOTA)algorithm by 4.49%.展开更多
Person re-identification(re-id)involves matching a person across nonoverlapping views,with different poses,illuminations and conditions.Visual attributes are understandable semantic information to help improve the iss...Person re-identification(re-id)involves matching a person across nonoverlapping views,with different poses,illuminations and conditions.Visual attributes are understandable semantic information to help improve the issues including illumination changes,viewpoint variations and occlusions.This paper proposes an end-to-end framework of deep learning for attribute-based person re-id.In the feature representation stage of framework,the improved convolutional neural network(CNN)model is designed to leverage the information contained in automatically detected attributes and learned low-dimensional CNN features.Moreover,an attribute classifier is trained on separate data and includes its responses into the training process of our person re-id model.The coupled clusters loss function is used in the training stage of the framework,which enhances the discriminability of both types of features.The combined features are mapped into the Euclidean space.The L2 distance can be used to calculate the distance between any two pedestrians to determine whether they are the same.Extensive experiments validate the superiority and advantages of our proposed framework over state-of-the-art competitors on contemporary challenging person re-id datasets.展开更多
Person re-identification is a prevalent technology deployed on intelligent surveillance.There have been remarkable achievements in person re-identification methods based on the assumption that all person images have a...Person re-identification is a prevalent technology deployed on intelligent surveillance.There have been remarkable achievements in person re-identification methods based on the assumption that all person images have a sufficiently high resolution,yet such models are not applicable to the open world.In real world,the changing distance between pedestrians and the camera renders the resolution of pedestrians captured by the camera inconsistent.When low-resolution(LR)images in the query set are matched with high-resolution(HR)images in the gallery set,it degrades the performance of the pedestrian matching task due to the absent pedestrian critical information in LR images.To address the above issues,we present a dualstream coupling network with wavelet transform(DSCWT)for the cross-resolution person re-identification task.Firstly,we use the multi-resolution analysis principle of wavelet transform to separately process the low-frequency and high-frequency regions of LR images,which is applied to restore the lost detail information of LR images.Then,we devise a residual knowledge constrained loss function that transfers knowledge between the two streams of LR images and HR images for accessing pedestrian invariant features at various resolutions.Extensive qualitative and quantitative experiments across four benchmark datasets verify the superiority of the proposed approach.展开更多
Accurate and continuous identification of individual cattle is crucial to precision farming in recent years.It is also the prerequisite to monitor the individual feed intake and feeding time of beef cattle at medium t...Accurate and continuous identification of individual cattle is crucial to precision farming in recent years.It is also the prerequisite to monitor the individual feed intake and feeding time of beef cattle at medium to long distances over different cameras.However,beef cattle can tend to frequently move and change their feeding position during feeding.Furthermore,the great variations in their head direction and complex environments(light,occlusion,and background)can also lead to some difficulties in the recognition,particularly for the bio-similarities among individual cattle.Among them,AlignedReID++model is characterized by both global and local information for image matching.In particular,the dynamically matching local information(DMLI)algorithm has been introduced into the local branch to automatically align the horizontal local information.In this research,the AlignedReID++model was utilized and improved to achieve the better performance in cattle re-identification(ReID).Initially,triplet attention(TA)modules were integrated into the BottleNecks of ResNet50 Backbone.The feature extraction was then enhanced through cross-dimensional interactions with the minimal computational overhead.Since the TA modules in AlignedReID++baseline model increased the model size and floating point operations(FLOPs)by 0.005 M and 0.05 G,the rank-1 accuracy and mean average precision(mAP)were improved by 1.0 percentage points and 2.94 percentage points,respectively.Specifically,the rank-1 accuracies were outperformed by 0.86 percentage points and 0.12 percentage points,respectively,compared with the convolution block attention module(CBAM)and efficient channel attention(ECA)modules,although 0.94 percentage points were lower than that of squeeze-and-excitation(SE)modules.The mAP metric values were exceeded by 0.22,0.86 and 0.12 percentage points,respectively,compared with the SE,CBAM,and ECA modules.Additionally,the Cross-Entropy Loss function was replaced with the CosFace Loss function in the global branch of baseline model.CosFace Loss and Hard Triplet Loss were jointly employed to train the baseline model for the better identification on the similar individuals.AlignedReID++with CosFace Loss was outperformed the baseline model by 0.24 and 0.92 percentage points in the rank-1 accuracy and mAP,respectively,whereas,AlignedReID++with ArcFace Loss was exceeded by 0.36 and 0.56 percentage points,respectively.The improved model with the TA modules and CosFace Loss was achieved in a rank-1 accuracy of 94.42%,rank-5 accuracy of 98.78%,rank-10 accuracy of 99.34%,mAP of 63.90%,FLOPs of 5.45 G,frames per second(FPS)of 5.64,and model size of 23.78 M.The rank-1 accuracies were exceeded by 1.84,4.72,0.76 and 5.36 percentage points,respectively,compared with the baseline model,part-based convolutional baseline(PCB),multiple granularity network(MGN),and relation-aware global attention(RGA),while the mAP metrics were surpassed 6.42,5.86,4.30 and 7.38 percentage points,respectively.Meanwhile,the rank-1 accuracy was 0.98 percentage points lower than TransReID,but the mAP metric was exceeded by 3.90 percentage points.Moreover,the FLOPs of improved model were only 0.05 G larger than that of baseline model,while smaller than those of PCB,MGN,RGA,and TransReID by 0.68,6.51,25.4,and 16.55 G,respectively.The model size of improved model was 23.78 M,which was smaller than those of the baseline model,PCB,MGN,RGA,and TransReID by 0.03,2.33,45.06,14.53 and 62.85 M,respectively.The inference speed of improved model on a CPU was lower than those of PCB,MGN,and baseline model,but higher than TransReID and RGA.The t-SNE feature embedding visualization demonstrated that the global and local features were achieve in the better intra-class compactness and inter-class variability.Therefore,the improved model can be expected to effectively re-identify the beef cattle in natural environments of breeding farm,in order to monitor the individual feed intake and feeding time.展开更多
基金supported in part by the Natural Science Foundation of Xinjiang Uygur Autonomous Region(Grant No.2022D01B186 and No.2022D01B05)。
文摘The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have the following two shortcomings:On the one hand,they mostly use global average pooling to generate context descriptors,without highlighting the guiding role of salient information on descriptor generation,resulting in insufficient ability of the final generated attention mask representation;On the other hand,the design of most attention modules is complicated,which greatly increases the computational cost of the model.To solve these problems,this paper proposes an attention module called self-supervised recalibration(SR)block,which introduces both global and local information through adaptive weighted fusion to generate a more refined attention mask.In particular,a special"Squeeze-Excitation"(SE)unit is designed in the SR block to further process the generated intermediate masks,both for nonlinearizations of the features and for constraint of the resulting computation by controlling the number of channels.Furthermore,we combine the most commonly used Res Net-50 to construct the instantiation model of the SR block,and verify its effectiveness on multiple Re-ID datasets,especially the mean Average Precision(m AP)on the Occluded-Duke dataset exceeds the state-of-the-art(SOTA)algorithm by 4.49%.
基金supported by the National Natural Science Foundation of China(6147115461876057)the Fundamental Research Funds for Central Universities(JZ2018YYPY0287)
文摘Person re-identification(re-id)involves matching a person across nonoverlapping views,with different poses,illuminations and conditions.Visual attributes are understandable semantic information to help improve the issues including illumination changes,viewpoint variations and occlusions.This paper proposes an end-to-end framework of deep learning for attribute-based person re-id.In the feature representation stage of framework,the improved convolutional neural network(CNN)model is designed to leverage the information contained in automatically detected attributes and learned low-dimensional CNN features.Moreover,an attribute classifier is trained on separate data and includes its responses into the training process of our person re-id model.The coupled clusters loss function is used in the training stage of the framework,which enhances the discriminability of both types of features.The combined features are mapped into the Euclidean space.The L2 distance can be used to calculate the distance between any two pedestrians to determine whether they are the same.Extensive experiments validate the superiority and advantages of our proposed framework over state-of-the-art competitors on contemporary challenging person re-id datasets.
基金supported by the National Natural Science Foundation of China(61471154,61876057)the Key Research and Development Program of Anhui Province-Special Project of Strengthening Science and Technology Police(202004D07020012).
文摘Person re-identification is a prevalent technology deployed on intelligent surveillance.There have been remarkable achievements in person re-identification methods based on the assumption that all person images have a sufficiently high resolution,yet such models are not applicable to the open world.In real world,the changing distance between pedestrians and the camera renders the resolution of pedestrians captured by the camera inconsistent.When low-resolution(LR)images in the query set are matched with high-resolution(HR)images in the gallery set,it degrades the performance of the pedestrian matching task due to the absent pedestrian critical information in LR images.To address the above issues,we present a dualstream coupling network with wavelet transform(DSCWT)for the cross-resolution person re-identification task.Firstly,we use the multi-resolution analysis principle of wavelet transform to separately process the low-frequency and high-frequency regions of LR images,which is applied to restore the lost detail information of LR images.Then,we devise a residual knowledge constrained loss function that transfers knowledge between the two streams of LR images and HR images for accessing pedestrian invariant features at various resolutions.Extensive qualitative and quantitative experiments across four benchmark datasets verify the superiority of the proposed approach.
基金National Key Research and Development Program(2023YFD1301801)National Natural Science Foundation of China(32272931)+1 种基金Shaanxi Province Agricultural Key Core Technology Project(2024NYGG005)Shaanxi Province Key R&D Program(2024NC-ZDCYL-05-12)。
文摘Accurate and continuous identification of individual cattle is crucial to precision farming in recent years.It is also the prerequisite to monitor the individual feed intake and feeding time of beef cattle at medium to long distances over different cameras.However,beef cattle can tend to frequently move and change their feeding position during feeding.Furthermore,the great variations in their head direction and complex environments(light,occlusion,and background)can also lead to some difficulties in the recognition,particularly for the bio-similarities among individual cattle.Among them,AlignedReID++model is characterized by both global and local information for image matching.In particular,the dynamically matching local information(DMLI)algorithm has been introduced into the local branch to automatically align the horizontal local information.In this research,the AlignedReID++model was utilized and improved to achieve the better performance in cattle re-identification(ReID).Initially,triplet attention(TA)modules were integrated into the BottleNecks of ResNet50 Backbone.The feature extraction was then enhanced through cross-dimensional interactions with the minimal computational overhead.Since the TA modules in AlignedReID++baseline model increased the model size and floating point operations(FLOPs)by 0.005 M and 0.05 G,the rank-1 accuracy and mean average precision(mAP)were improved by 1.0 percentage points and 2.94 percentage points,respectively.Specifically,the rank-1 accuracies were outperformed by 0.86 percentage points and 0.12 percentage points,respectively,compared with the convolution block attention module(CBAM)and efficient channel attention(ECA)modules,although 0.94 percentage points were lower than that of squeeze-and-excitation(SE)modules.The mAP metric values were exceeded by 0.22,0.86 and 0.12 percentage points,respectively,compared with the SE,CBAM,and ECA modules.Additionally,the Cross-Entropy Loss function was replaced with the CosFace Loss function in the global branch of baseline model.CosFace Loss and Hard Triplet Loss were jointly employed to train the baseline model for the better identification on the similar individuals.AlignedReID++with CosFace Loss was outperformed the baseline model by 0.24 and 0.92 percentage points in the rank-1 accuracy and mAP,respectively,whereas,AlignedReID++with ArcFace Loss was exceeded by 0.36 and 0.56 percentage points,respectively.The improved model with the TA modules and CosFace Loss was achieved in a rank-1 accuracy of 94.42%,rank-5 accuracy of 98.78%,rank-10 accuracy of 99.34%,mAP of 63.90%,FLOPs of 5.45 G,frames per second(FPS)of 5.64,and model size of 23.78 M.The rank-1 accuracies were exceeded by 1.84,4.72,0.76 and 5.36 percentage points,respectively,compared with the baseline model,part-based convolutional baseline(PCB),multiple granularity network(MGN),and relation-aware global attention(RGA),while the mAP metrics were surpassed 6.42,5.86,4.30 and 7.38 percentage points,respectively.Meanwhile,the rank-1 accuracy was 0.98 percentage points lower than TransReID,but the mAP metric was exceeded by 3.90 percentage points.Moreover,the FLOPs of improved model were only 0.05 G larger than that of baseline model,while smaller than those of PCB,MGN,RGA,and TransReID by 0.68,6.51,25.4,and 16.55 G,respectively.The model size of improved model was 23.78 M,which was smaller than those of the baseline model,PCB,MGN,RGA,and TransReID by 0.03,2.33,45.06,14.53 and 62.85 M,respectively.The inference speed of improved model on a CPU was lower than those of PCB,MGN,and baseline model,but higher than TransReID and RGA.The t-SNE feature embedding visualization demonstrated that the global and local features were achieve in the better intra-class compactness and inter-class variability.Therefore,the improved model can be expected to effectively re-identify the beef cattle in natural environments of breeding farm,in order to monitor the individual feed intake and feeding time.