期刊文献+

基于双路径循环神经网络的单通道语音增强 被引量:12

Single Channel Speech Enhancement Based on Dual-path Recurrent Neural Network
在线阅读 下载PDF
导出
摘要 近年来,随着神经网络的应用,语音增强效果显著提升。但对关联性较强的长序列语音数据,单一的网络结构受到自身性能的限制可能无法继续提升增强效果。为了进一步提升神经网络对语音增强的效果,本文将一种被称为双路径循环神经网络(dual-path recurrent neural network,DPRNN)的复合网络结构应用在语音增强任务中。该复合网络结构由卷积神经网络(convolution neural network,CNN)和长短时记忆神经网络(Long short-term memory,LSTM)组成,网络的核心是两个LSTM组成的双路径循环神经网络块(DPRNN Block)。DPRNN将长序列语音数据分割为重叠帧数据块,利用DPRNN Block对这些数据块执行块内计算和块间计算,以此实现数据的局部和全局建模。实验结果表明,相比于单一网络结构,DPRNN在已知噪声和未知噪声条件下均取得最好结果。 In recent years,speech enhancement has improved significantly with the application of neural networks.However,for long-sequence speech with strong relevance,single network structure may not be able to continue to improve the enhancement effect due to its own performance limitations.To further improve the effect of neural networks on speech enhancement,this paper applied a composite network structure called dual-path recurrent neural network(DPRNN)to speech enhancement tasks.The composite network structure consists of convolutional neural network(CNN)and long short-term memory(LSTM),the core is a dual-path recurrent neural network block(DPRNN Block)composed of two LSTMs.DPRNN splits the long sequence of speech data into overlapping frames data chunks and performs intra-and inter-chunk calculations on these chunks using DPRNN Blocks to achieve local and global data modeling.The experimental result shows,compared with single network structure,DPRNN achieves the best results in both trained noise and untrained noise conditions.
作者 王志杰 张学良 WANG Zhijie;ZHANG Xueliang(College of Computer Science,Inner Mongolia University,Hohhot,Inner Mongolia 010000,China)
出处 《信号处理》 CSCD 北大核心 2021年第10期1872-1879,共8页 Journal of Signal Processing
基金 国家自然科学基金项目(61876214)。
关键词 语音增强 双路径循环神经网络 长短时记忆网络 卷积神经网络 speech enhancement dual-path recurrent neural network long short-term memory convolution neural network
作者简介 王志杰,男,1993年生,内蒙古乌兰察布市人。内蒙古大学计算机学院硕士研究生,主要研究方向为语音频带扩展。E-mail:wangzhijieshe@sina.com;张学良,男,1980年生,内蒙古鄂尔多斯人。内蒙古大学计算机学院教授,博士生导师,工学博士。主要研究方向为语音信号处理、计算听觉场景分析、语音合成、语音分离。E-mail:cszxl@imu.edu.cn。
  • 相关文献

参考文献1

二级参考文献9

  • 1Lim J S, Oppenheim A V.Enhancement and bandwidth compression of noisy speech[C]//Proceedings of the IEEE, 1979, 67: 1586-1604.
  • 2Ephraim Y,Malah D.Speech enhancement using a minimum meansquare error log-spectral amplitude estimator[J].IEEE Trans on Acoustics,Speech,Signal Processing, 1985,ASSP-32:443-445.
  • 3Ephraim Y, Malah D.Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator[J]. IEEE Trans on Acoustics, Speech, Signal Processing, 1984, AS- SP-32 : 1109-1121.
  • 4Capp60.Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor[J].IEEE Trans on Speech and Audio Processing, 1994,2 (2) : 345-349.
  • 5Scalart P,Vieira-Filho J.Speech enhancement based on a priori signal to noise estimation[C]//Proc 21st IEEE Int Conf Acoust Speech Signal Processing, Atlanta, GA, 1996,2 (2) : 629-632.
  • 6Cohen I.Speech enhancement using a noncausal a priori SNR estimator[J].IEEE Signal Processing Letters,2004(9):725-728.
  • 7Arslan L M.Modified Wiener filtering[J].Signal Processing,2006, 86(2) :267-272.
  • 8Xu Yao-hua, Guo Ying, Li Wei, et al.Elimination of musical noise phenomenon with Burg-based a priori SNR estimator[C]// Image and Signal Processing, 2008, CISP' 08,2008,5 : 328-332.
  • 9赵靖,龚卫国,杨利平.基于GMM的普通话和四川方言独立文本的说话人确认[J].计算机应用,2008,28(3):792-794. 被引量:2

共引文献18

同被引文献90

引证文献12

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部