摘要
针对现有视觉问答模型中自注意力机制过滤噪声信息能力较差的问题,提出了一种基于多模态门控自注意力(Multimodal gate self-attention, MGSA)机制的视觉问答模型。该模型在自注意力模块中利用其他模态特征作为通道调节门,以过滤目标模态特征自注意力学习的输出结果;同时结合跨模态双导向注意力机制与堆叠注意力模块,共同学习协同注意力和深层注意力;最后将包含丰富注意结果的视觉和语言特征进行特征融合,通过分类网络获得预测结果。在视觉问答公开数据集VQA-v2上进行实验,结果表明:该模型在Test-dev和Test-std两个测试子集的总准确率分别为70.76%和71.12%,优于当前主要模型;变体模型间的性能对比验证了模型中各模块的有效性。该模型具有较强的过滤噪声信息能力,有效提升了视觉问答模型的性能。
To address the problem of poor noise information filtering ability in the self-attention mechanism of the existing visual question answering model, a visual question answering model based on multimodal gate self-attention(MGSA) mechanism was proposed in this paper. In the self-attention module, other modal features were used as the channel adjustment gate to filter the output results of self-attention learning with target modal features. Combined with the cross modal dual guided-attention mechanism and the stacked attention, it jointly learned co-attention and deep attention. Finally, through the feature fusion of visual features and language features containing rich attention results, the prediction results were obtained by means of classification network. The experiment was carried out on the visual question answering public data set VQA-v2. The results show that the total accuracy of this model in Test-dev and Test-std subsets is 70.76% and 71.12%, respectively, which is superior to the current main models. Through the performance comparison between variant models, the effectiveness of each module in the model has been verified, indicating that the proposed model has strong ability to filter noise information and it can effectively improve the performance of the visual question answering model.
作者
陈巧红
漏杨波
孙麒
贾宇波
CHEN Qiaohong;LOU Yangbo;SUN Qi;JIA Yubo(School of Information Science and Technology,Zhejiang Sci-Tech University,Hangzhou 310018,China)
出处
《浙江理工大学学报(自然科学版)》
2022年第3期413-423,共11页
Journal of Zhejiang Sci-Tech University(Natural Sciences)
基金
浙江理工大学中青年骨干人才培养经费项目。
关键词
视觉问答
多模态
门控自注意力
双导向注意力
特征融合
visual question answering
multimodal
gate self-attention
dual guided-attention
feature fusion
作者简介
陈巧红(1978-),女,浙江临海人,副教授,博士,主要从事计算机辅助设计及机器学习方面的研究。