基于多模态联合语义感知的零样本目标检测

Zero-shot object detection based on multi-modal joint semantic perception

在线阅读下载PDF

导出

摘要零样本目标检测借助语义嵌入作为引导信息,将未见过的物体的视觉特征与类别语义嵌入映射到同一空间,根据其在映射空间的距离进行分类,但由于语义信息获取的单一性,视觉信息缺乏可靠表示,易混淆背景信息和未见过对象信息,使得视觉和语义之间很难无差别对齐。基于此,借助视觉上下文模块捕捉视觉特征的上下文信息,并通过语义优化模块对文本上下文和视觉上下文信息进行交互融合,增加视觉表达的多样化,使模型感知到前景的辨别性语义,从而有效地实现零样本目标检测。在MS-COCO的2个划分数据集上进行实验,在零样本目标检测和广义零样本目标检测的准确率和召回率上取得了提升,结果证明了所提方法的有效性。 Existing zero-shot object detection maps visual features and category semantic embeddings of unseen items to the same space using semantic embeddings as guiding information,and then classifies the objects based on how close together the visual features and semantic embeddings are in the mapped space.However,due to the singleness of semantic information acquisition,the lack of reliable representation of visual information can easily confuse background information and unseen object information,making it difficult to indiscriminately align visual and semantic information.In order to effectively achieve zero-shot object detection,this paper uses the visual context module to capture the context information of visual features and the semantic optimization module to interactively fuse the text context and visual context information.By increasing the diversity of visual expressions,the model is able to perceive the discriminative semantics of the foreground.Experiments were conducted on two divided datasets of MS-COCO,and a certain improvement was achieved in the accuracy and recall rate of zero-shot target detection and generalized zero-shot target detection.The results proved the effectiveness of the proposed method.

作者段立娟袁蓥王文健梁芳芳 DUAN Lijuan;YUAN Ying;WANG Wenjian;LIANG Fangfang(Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China;Beijing Key Laboratory of Trusted Computing,Beijing 100124,China;National Engineering Laboratory for Critical Technologies of Information Security Classified Protection,Beijing 100124,China;Faculty of Information Science and Technology,Hebei Agricultural University,Baoding 071001,China;Hebei Key Laboratory of Agricultural Big Data,Baoding 071001,China)

机构地区北京工业大学信息学部可信计算北京市重点实验室信息安全等级保护关键技术国家工程实验室河北农业大学信息科学与技术学院河北省农业大数据重点实验室

出处《北京航空航天大学学报》 EI CAS CSCD 北大核心 2024年第2期368-375,共8页 Journal of Beijing University of Aeronautics and Astronautics

基金国家自然科学基金(62176009,62106065) 北京市教育委员会科学研究计划(KZ201910005008)。

关键词目标检测零样本目标检测多模态上下文感知语义优化 object-detection zero-shot object detection multi-modal context perception semantic optimization

分类号 TP391 [自动化与计算机技术—计算机应用技术]

作者简介通信作者:段立娟.E-mail:ljduan@bjut.edu.cn。

引文网络
相关文献

1李妍,羌卫中,李珍,邹德清,金海.基于程序过程间语义优化的深度学习漏洞检测方法[J].网络与信息安全学报,2023,9(6):86-101. 被引量：2
2朱志国,郭军军,余正涛.一种Mask交互融合预训练知识的低资源神经机器翻译方法[J].小型微型计算机系统,2024,45(3):591-597. 被引量：1
3彭涛,林青霞,董晓华,石小涛.基于问卷调查的工程水文学混合式教学效果探究[J].科教导刊,2023(33):140-143. 被引量：1
4徐拥华.艺科融合的设计学类专业人才培养新探——以武汉设计工程学院为例[J].大学教育,2023(19):135-137.
5张利刚,李波,陈靖承.采煤工作面设备智能联动控制应用研究[J].中国煤炭工业,2024(2):68-69.
6代琪怡,刘维.病毒入侵下无线网络覆盖漏洞感知模型设计[J].计算机仿真,2024,41(1):433-437.

北京航空航天大学学报

2024年第2期

浏览历史

内容加载中请稍等...

基于多模态联合语义感知的零样本目标检测

相关作者

相关机构

相关主题

浏览历史