摘要
基于哈希的跨模态检索以其存储消耗低、查询速度快等优点受到广泛的关注.跨模态哈希学习的核心问题是如何对不同模态数据进行有效地共享语义空间嵌入学习.大多数算法在对多模态数据进行共享空间嵌入的过程中忽略了特征表示的语义判别性,从而导致哈希码表示的类别区分性不强,降低了最近邻搜索的准确性和鲁棒性.该文提出了基于语义耦合相关的判别式跨模态哈希特征表示学习算法.算法在模型的优化目标函数设计上综合了线性判别分类器的思想和跨模态相关性最大化思路,通过引入线性分类器,使得各模态都能够分别学习到各自具有判别性的二进制哈希码.同时利用耦合哈希表示在嵌入语义空间中最大化不同模态之间的相关性,不仅克服了把多种数据投影到一个共同嵌入语义空间的缺陷,而且能够捕捉到不同模态之间的语义相关性.算法在Wiki、LabelMe以及NUS_WID三个基准数据集上与最近相关的算法进行了实验比较.实验结果表明该文提出的方法在检索精度和计算效率上有明显的优势.
A variety of multimedia data on the network have increased exponentially in recent years including multi-modal data,such as video,picture,audio,text,etc.Different modal data are often interrelated.For example,in WeChat’s friends circle moments,voice and short videos are often given when publishing pictures.When searching a topic,users expect to get rich and comprehensive retrieval results which include different media data,so how to achieve the cross-modal retrieval between different modal data has become a research hotspot in the multimedia field.The cross-modal retrieval methods based on hashing has attracted much attention for their low storage cost and fast query speed.The core problem of cross-modal hashing learning is how to learn efficiently the shared embedding semantic space of different modal data.There are two categories of approaches to handle the problem.The first category is the unsupervised methods,trying to learn the hashing function from the underlying structure,distribution,and topology information of the data in order to maintain the original data space structure.The second category is the supervised methods to combine the semantic label information in the process of the hashing learning.However,Most of algorithms neglect the semantic discrimination of feature representation in the process of embedding the multi-modal data into the shared space,which leads to weaken the classification discrimination of the hash codes from different classes and reduce accuracy and robustness of the nearest neighbor search.In this paper,a linear discriminative cross-modal hashing learning algorithm with coupled semantic correlation is proposed,which integrates linear discriminative classifier and maximizing the correlation between cross-modals in the objective function of the model.First,we apply the linear classifier into modeling the supervised hashing learning so that each modal can learn respectively the discriminative binary hash code with high classification performance.Second,we project data from different modes into their embedding spaces to get their respective hash codes,and then the correlations between different modalities are maximized in the embedding spaces by joint coupled-hashing representation,so not only the defects of projecting a variety of data into a common embedding semantic space are overcome,but also the semantic relevance between different modal data can be captured.In the experiments,three kinds of performance evaluation indexes were employed,including the mean average precision (MAP) for ten times,the precision recall curve (PR) which implies the retrieval accuracy under different recall rates and the top N precision that indicates the change of accuracy relative to the number of the retrieval instances.In order to show the effectiveness of this algorithm,we compared it with six current relevant algorithms on three benchmark datasets including two cross-modal retrieval tasks:1)the retrieving pictures with text;2)the retrieving text with pictures.The experimental results show that the proposed method achieves obvious advantages on the retrieval accuracy and the computational efficiency.Additionally,the influence of the algorithm’s parameters on its performance was also investigated by changing one parameter while fixing other parameters.The investigation demonstrates the proposed method is insensitive to the parameters varieties in a wide range and obtained good results.
作者
严双咏
刘长红
江爱文
叶继华
王明文
YAN Shuang-Yong;LIU Chang-Hong;JIANG Ai-Wen;YE Ji-Hua;WANG Ming-Wen(School of Computer and Information Engineering, Jiangxi Normal University, Nanchang 330022)
出处
《计算机学报》
EI
CSCD
北大核心
2019年第1期164-175,共12页
Chinese Journal of Computers
基金
国家自然科学基金(61662030
61365002
61462042
61462045)
江西省自然科学基金(20171BAB202016)
江西省教育厅科技项目(GJJ150350)资助~~
关键词
跨模态检索
跨模态哈希
线性分类器
语义相关性
共享子空间
多模态
cross-modal retrieval
cross-modal hashing
linear classifier
semantic correlation
shared subspace
multi-modal
作者简介
严双咏,男,1990年生,硕士研究生,主要研究方向为信息检索、计算机视觉.E-mail:13170884058@163.com;通信作者:刘长红,女,1977年生,博士,副教授,中国计算机学会(CCF)会员,主要研究方向为计算机视觉、机器学习、高光谱图像处理.E-mail:liuch@jxnu.edu.cn;江爱文,男,1984年生,博士,副教授,中国计算机学会(CCF)会员,主要研究方向为模式识别、图像分析与检索、机器学习;叶继华,男,1966年生,硕士,教授,中国计算机学会(CCF)会员,主要研究领域为数据融合、模式识别与物联网技术;王明文,男,1965年生,博士,教授,中国计算机学会(CCF)会员,主要研究领域为自然语言处理、信息检索.