摘要
零样本目标检测借助语义嵌入作为引导信息,将未见过的物体的视觉特征与类别语义嵌入映射到同一空间,根据其在映射空间的距离进行分类,但由于语义信息获取的单一性,视觉信息缺乏可靠表示,易混淆背景信息和未见过对象信息,使得视觉和语义之间很难无差别对齐。基于此,借助视觉上下文模块捕捉视觉特征的上下文信息,并通过语义优化模块对文本上下文和视觉上下文信息进行交互融合,增加视觉表达的多样化,使模型感知到前景的辨别性语义,从而有效地实现零样本目标检测。在MS-COCO的2个划分数据集上进行实验,在零样本目标检测和广义零样本目标检测的准确率和召回率上取得了提升,结果证明了所提方法的有效性。
Existing zero-shot object detection maps visual features and category semantic embeddings of unseen items to the same space using semantic embeddings as guiding information,and then classifies the objects based on how close together the visual features and semantic embeddings are in the mapped space.However,due to the singleness of semantic information acquisition,the lack of reliable representation of visual information can easily confuse background information and unseen object information,making it difficult to indiscriminately align visual and semantic information.In order to effectively achieve zero-shot object detection,this paper uses the visual context module to capture the context information of visual features and the semantic optimization module to interactively fuse the text context and visual context information.By increasing the diversity of visual expressions,the model is able to perceive the discriminative semantics of the foreground.Experiments were conducted on two divided datasets of MS-COCO,and a certain improvement was achieved in the accuracy and recall rate of zero-shot target detection and generalized zero-shot target detection.The results proved the effectiveness of the proposed method.
作者
段立娟
袁蓥
王文健
梁芳芳
DUAN Lijuan;YUAN Ying;WANG Wenjian;LIANG Fangfang(Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China;Beijing Key Laboratory of Trusted Computing,Beijing 100124,China;National Engineering Laboratory for Critical Technologies of Information Security Classified Protection,Beijing 100124,China;Faculty of Information Science and Technology,Hebei Agricultural University,Baoding 071001,China;Hebei Key Laboratory of Agricultural Big Data,Baoding 071001,China)
出处
《北京航空航天大学学报》
EI
CAS
CSCD
北大核心
2024年第2期368-375,共8页
Journal of Beijing University of Aeronautics and Astronautics
基金
国家自然科学基金(62176009,62106065)
北京市教育委员会科学研究计划(KZ201910005008)。
关键词
目标检测
零样本目标检测
多模态
上下文感知
语义优化
object-detection
zero-shot object detection
multi-modal
context perception
semantic optimization
作者简介
通信作者:段立娟.E-mail:ljduan@bjut.edu.cn。