期刊文献+

基于深度情感嵌入和图注意力网络的情感伪造音频检测方法

Emotion fake audio detection method based on deep emotion embedding and graph attention network
在线阅读 下载PDF
导出
摘要 情感伪造音频通过改变语音的情感状态来达到欺骗目的,这对现有的伪造音频检测模型提出了新的挑战。提出一种基于深度情感嵌入和图注意力网络的情感伪造音频检测方法(Graph Attention Networks Using Deep Emotion Embedding,GADE),旨在提高对情感伪造音频的检测能力。GADE由深度情感嵌入提取前端和基于图注意力网络的后端2个部分组成。深度情感嵌入提取前端利用共注意力机制结合传统手工特征与深度特征,分别提取语音中时域和频域的深度情感信息;基于图注意力网络的后端能够有效融合时域和频域信息,提高模型对情感伪造音频的检测性能。在ASVspoof 2019、ASVspoof 2021和EmoFake数据集上与常见的伪造音频检测模型进行对比实验。结果表明:提出的GADE在未使用情感伪造音频训练的情况下,相比现有的先进伪造音频检测模型AASIST,对情感伪造音频的检测性能提高了22.8%;在使用情感伪造音频训练后,对情感伪造音频的检测性能提高了77.3%。 Emotion fake audio deceives by altering the emotional state of speech.This poses a novel challenge to exist-ing fake audio detection models.The paper introduces a fake audio detection method based on deep emotion embedding and a graph attention network named graph attention networks using deep emotion embedding(GADE),to enhance the de-tection of emotion fake audio.GADE comprises two components:a frontend for deep emotion embedding extraction and a backend based on the graph attention network.The frontend module employs the co-attention mechanism that combines traditional manual features and deep features to extract deep emotional information from both the time and frequency do-mains of speech.The backend network effectively fuses information across these domains,thereby improving the model’s detection capability for emotional audio.Comparative experiments were conducted on ASVspoof 2019,ASVspoof 2021,and EmoFake datasets with common fake audio detection models.The results show that the proposed GADE improves the emotion fake audio detection performance by 22.8%compared to the existing advanced fake audio detection model AASIST,without using emotion fake audio training.When emotion fake audio is incorporated,the detection performance of emotion fake audio improved by 77.3%.
作者 赵炎 李青 周淑霞 齐巧玲 李英双 董永峰 ZHAO Yan;LI Qing;ZHOU Shuxia;QI Qiaoling;LI Yingshuang;DONG Yongfeng(School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China;Tianjin International Joint Center for Virtual Reality and Visual Computing,Tianjin 300401,China;Hebei Jiaotong Vocational and Technical College,Shijiazhuang,Hebei 050035,China;Hebei University Road Traffic Perception and Intelligent Application Technology Research and Development Center,Shijiazhuang,Hebei 050035,China;Handan Vocational College of Science and Technology,Handan,Hebei 056046,China)
出处 《河北工业大学学报》 CAS 2024年第6期35-43,共9页 Journal of Hebei University of Technology
基金 国家重点研发计划资助项目(2020AAA0140003) 河北省高等教育教学改革研究与实践项目(2023GJJG056)。
关键词 伪造音频检测 情感伪造音频 深度特征 图注意力网络 fake audio detection emotion fake audio deep feature graph attention network
作者简介 第一作者:赵炎(1999-),男,硕士研究生。;通信作者:董永峰(1977-),男,教授,dongyf@hebut.edu.cn。
  • 相关文献

参考文献2

二级参考文献12

共引文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部