摘要
【目的】梳理归纳多模态命名实体识别研究成果,为后续相关研究提供参考与借鉴。【文献范围】在Web of Science、IEEE Xplore、ACM Digital Library、中国知网数据库中,以“多模态命名实体识别”“多模态信息抽取”“多模态知识图谱”为检索词进行文献检索,共筛选出83篇代表性文献。【方法】从概念、特征表示、融合策略和预训练模型4个方面对多模态命名实体识别研究进行总结论述,指出现存问题和未来研究方向。【结果】多模态命名实体识别目前主要围绕模态特征表示和融合两个方面展开且在社交媒体领域取得了一定进展,需要进一步改进多模态细粒度特征提取和语义关联映射方法以提升模型的泛化性和可解释性。【局限】直接以多模态命名实体识别为研究主题的文献数量较少,在支撑综述结果方面存在局限性。【结论】针对多模态命名实体识别亟需解决的问题展望未来发展趋势,为进一步拓宽多模态学习在下游任务应用的研究范畴、破解模态壁垒和语义鸿沟提供了新思路。
[Objective]This paper reviews multimodal named entity recognition research to provide references for future studies.[Coverage]We selected 83 representative papers using“multimodal named entity recognition”,“multimodal information extraction”,and“multimodal knowledge graph”as the search terms for the Web of Science,IEEE Xplore,ACM digital library,and CNKI databases.[Methods]We summarized the multimodal named entity recognition research in four aspects:concepts,feature representation,fusion strategies,and pre-trained models.We also identified existing problems and future research directions.[Results]Multimodal named entity recognition studies focus on modal feature representation and fusion.It made some progress in the field of social media.They need to improve multimodal fine-grained feature extraction and semantic association mapping methods to enhance the models’generalization and interpretability.[Limitations]There is insufficient literature directly using multimodal named entity recognition as a research topic.[Conclusions]Our study provides new ideas to expand the applications of multimodal learning,break the modal barriers,and bridge the semantic gaps.
作者
韩普
陈文祺
Han Pu;Chen Wenqi(School of Management,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;Provincial Key Laboratory of Data Engineering and Knowledge Service(Nanjing University),Nanjing 210023,China)
出处
《数据分析与知识发现》
EI
CSSCI
CSCD
北大核心
2024年第4期50-63,共14页
Data Analysis and Knowledge Discovery
基金
国家社会科学基金项目(项目编号:22BTQ096)
江苏高校青蓝工程
江苏省研究生科研创新计划基金项目(项目编号:KYCX23_0930)的研究成果之一
关键词
多模态命名实体识别
特征表示
多模态融合
多模态预训练
Multimodal Named Entity Recognition
Feature Representation
Multimodal Fusion
Multimodal Pre-training
作者简介
通讯作者:韩普,ORCID:0000-0001-5867-4292,E-mail:hanpu@njupt.edu.cn。