摘要
                
                    3D shape recognition has drawn much attention in recent years.The view-based approach performs best of all.However,the current multi-view methods are almost all fully supervised,and the pretraining models are almost all based on ImageNet.Although the pretraining results of ImageNet are quite impressive,there is still a significant discrepancy between multi-view datasets and ImageNet.Multi-view datasets naturally retain rich 3D information.In addition,large-scale datasets such as ImageNet require considerable cleaning and annotation work,so it is difficult to regenerate a second dataset.In contrast,unsupervised learning methods can learn general feature representations without any extra annotation.To this end,we propose a three-stage unsupervised joint pretraining model.Specifically,we decouple the final representations into three fine-grained representations.Data augmentation is utilized to obtain pixel-level representations within each view.And we boost the spatial invariant features from the view level.Finally,we exploit global information at the shape level through a novel extract-and-swap module.Experimental results demonstrate that the proposed method gains significantly in 3D object classification and retrieval tasks,and shows generalization to cross-dataset tasks.
                
                
    
    
    
    
            
                基金
                    This work was supported in part by National Natural Science Foundation of China(No.61976095)
                    the Science and Technology Planning Project of Guangdong Province,China(No.2018B030323026).
            
    
    
    
                作者简介
Luequan Wang received the B.Sc.degree in automation from South China University of Technology,China in 2020.He is a master student in automation science and engineering at South China University of Technology,China.His research interests include self-supervised learning,3D vision and deep learning.E-mail:875713197@qq.com ORCID iD:0000-0001-9320-6873;Hongbin Xu received the M.Sc.degree from South China University of Technology,China in 2021.He is currently a Ph.D.degree candidate in automation science and engineering at South China University of Technology(SCUT),China.His research interests include 3D vision,multi-view stereo and self-supervised learning.E-mail:hongbinxu1013@gmail.com ORCID iD:0000-0002-3455-1527;Wenxiong Kang received the M.Sc.degree from Northwestern Polytechnical University,China in 2003,and the Ph.D.degree in automation science and engineering from South China University of Technology,China in 2009.He is currently a professor with School of Automation Science and Engineering,South China University of Technology,China.His research interests include biometrics identification,image processing,pattern recognition and computer vision.E-mail:auwxkang@scut.edu.cn(Corresponding author)ORCID iD:0000-0001-9023-7252。