一种基于特征融合的耳语音向正常音的转换方法

Method for Transforming Whisper to Normal Speech with Feature Fusion

在线阅读下载PDF

导出

摘要使用耳语音的频谱包络来预估正常音的基频特征,这类算法在对正常音基频预测的准确性上存在一定不足,在合成语音自然度方面存在着明显欠缺,有时会出现音调失常等问题。本文提出一种声学特征融合的方法,通过双向长短期记忆(Bi‑long short‑term memory,BLSTM)深度网络来逐帧预测正常音基频。首先,使用STRAIGHT模型和相关代码,分别对耳语音和正常音语料进行预处理,提取耳语音的梅尔倒谱系数(Mel‑scale frequency cepstral coefficient,MFCC)、韵律及谱包络特征,正常音的基频与谱包络特征。然后使用BLSTM深度网络,分别建立耳语音和正常音谱包络特征之间映射关系,以及耳语音MFCC、韵律及谱包络特征对正常音基频F0的映射关系。最后根据耳语音的MFCC、韵律及谱包络特征获得对应的正常音基频和谱包络,使用STRAIGHT模型合成正常音。实验结果表明,相较于仅使用谱包络估计基频,采用此种方法引入语音韵律和MFCC的融合特征是对基频特征的良好补充,解决了音调失常的现象,转换后的语音在韵律上更加接近正常发音。 Currently,in reconstruction of normal speech from whispered speech based on neural network,the spectral envelope of the whisper is often used to estimate F0 characteristics of the normal speech.Such algorithms have certain deficiencies in the accuracy of F0.There is a clear lack of naturalness,and sometimes the pitch distortion occurs.This paper proposes a method for predicting the F0 of normal speech frame by frame using the Bi‑long short‑term memory(BLSTM)deep network with the acoustic fusion feature of normal speech.Firstly,the STRAIGHT model and related codes are used to preprocess the whisper and the normal speech corpus.Respectively,extract the Mel‑scale frequency cepstral coefficient(MFCC),rhythm and spectral envelope of the whisper speech and the F0 and spectral envelope of the normal speech.Secondly,the BLSTM deep network is used to establish a mapping relationship between spectrums of whisper and normal speech,and a mapping relationship between MFCC,rhythm and spectral envelope features of whisper speech and F0 of normal speech.Finally,according to MFCC,rhythm and spectral envelope features of whisper speech,the F0 and spectral envelope of the corresponding normal speech are obtained,and the normal speech is synthesized using the STRAIGHT model.The experimental results show that compared with the estimation of the F0 using only the spectral envelope,the introduction of fusion features of phonetic rhythm and MFCC is a good complement to the F0 features,which solves the phenomenon of pitch disorders and the converted speech is closer to normal speech in rhythm.

作者庞聪连海伦周健王华彬陶亮 PANG Cong;LIAN Hailun;ZHOU Jian;WANG Huabin;TAO Liang(Key Laboratory of Computational Intelligence and Signal Processing,Ministry of Education,Anhui University,Hefei,230039,China)

机构地区安徽大学计算智能与信号处理教育部重点实验室

出处《南京航空航天大学学报》 EI CAS CSCD 北大核心 2020年第5期777-782,共6页 Journal of Nanjing University of Aeronautics & Astronautics

基金国家自然科学基金(61301295)资助项目安徽省自然科学基金(1708085MF151)资助项目安徽高校自然科学基金(KJ2018A0018)资助项目安徽大学科研训练计划(J10118520444)资助项目。

关键词语音转换特征融合韵律模型 STRAIGHT模型双向长短期记忆 voice conversion feature fusion prosodic model STRAIGHT model bi‑long short‑term memory

分类号 TN912.3 [电子电信—通信与信息系统]

作者简介通信作者:周健,男,副教授,E-mail:jzhou@ahu.edu.cn。

引文网络
相关文献

参考文献4

1何凌,黄华,刘肖珩.基于韵律特征参数的情感语音合成算法研究[J].计算机工程与设计,2013,34(7):2566-2569. 被引量：8
2沙丹青,栗学丽,徐柏龄.耳语音声调特征的研究[J].电声技术,2003,27(11):4-7. 被引量：21
3王民,苏利博,王稚慧,要趁红.采用STRAIGHT模型和深度信念网络的语音转换方法[J].计算机工程与科学,2016,38(9):1950-1954. 被引量：4
4周健,窦云峰,刘荣敏,王华彬,陶亮.采用低维特征映射的耳语音向正常音转换[J].声学学报,2018,43(5):855-863. 被引量：2

二级参考文献19

1康永国,双志伟,陶建华,张维.基于混合映射模型的语音转换算法研究[J].声学学报,2006,31(6):555-562. 被引量：13
2邵艳秋,韩纪庆,王卓然,刘挺.韵律参数和频谱包络修改相结合的情感语音合成技术研究[J].信号处理,2007,23(4):526-530. 被引量：7
3Taisuke Itoh, Kazuya Takeda, Fumitada Itakura. Acoustic Analysis and Recognition of Whispered Speech[J]. ICASSP,2002: 389-392.
4Robert W. Morris, Mark A. Clements. Reconstruction of Speech from Whispers [J]. Medical Engineering & Physics, 200'2,24: 515-520.
5Qian-Jie Fu,Fan-Gang Zeng. Identification of Temporal Envelope Cues in Chinese Tone Recognition [J]. Asia Pacific Journal of Speech, Language and Hearing,2000,(5) :45-57.
6Man Gao. Tones in Whispered Chinese:Articulatory and PerceptualCues. [Master], University of Victoria,2002.
7W Meyer Eppler. Realization of Prosodic Features in Whispered Speech [J]. Journal of Acoustical Society of America, 1957, 29( 1 ) : 104-106.
8林茂灿.普通话声调的声学特性和知觉征兆[J].中国语文,1988,(2):182-193.
9梁之安.汉语普通话声调的听觉辨认依据[J].生理学报,1963,26(2):85-91.
10徐宁,杨震.高合成质量的语音转换系统[J].应用科学学报,2008,26(4):378-383. 被引量：1

共引文献31

1李晗菲,冯燕,孟亚茹,彭刚.能量包络和音长对普通话声调感知的影响[J].中国语音学报,2019(1):49-59. 被引量：1
2李景娜,吴娇.中国学习者对英语情感语调的感知研究[J].中国外语研究,2021(1):54-62.
3潘梦鹞,吕小勇,陈少伟,郇锐铁,王锋.基于AI智能语音技术线上教学的创新与实践[J].创新创业理论研究与实践,2022(24):170-173. 被引量：3
4LIXueli,XUBoling.Tone features in whispered Chinese[J].Progress in Natural Science:Materials International,2005,15(3):285-288. 被引量：5
5杨莉莉,李燕,徐柏龄.汉语耳语音库的建立与听觉实验研究[J].南京大学学报（自然科学版）,2005,41(3):311-317. 被引量：13
6宋益丹.汉语声调实验研究回望[J].语文研究,2006(1):41-45. 被引量：18
7樊星,卢晶,徐柏龄.汉语耳语音转换为正常音的研究[J].电声技术,2005,29(12):44-47. 被引量：11
8杨莉莉,林玮,徐柏龄.汉语耳语音孤立字识别研究[J].应用声学,2006,25(3):187-192. 被引量：8
9荣薇,陶智,顾济华,赵鹤鸣.基于改进LPCC和MFCC的汉语耳语音识别[J].计算机工程与应用,2007,43(30):213-216. 被引量：17
10荣薇,陶智,顾济华,赵鹤鸣.基于概率神经网络的汉语耳语音识别系统[J].计算机工程与应用,2008,44(17):148-150. 被引量：3

1李玲.中国喜剧小品《老伴儿》的语用浅析[J].长江丛刊,2019,0(32):45-45.
2钟顺明,况鹏,庄豪爽,冯韩德,王剑莹,张涵.基于PNCC与基频的鲁棒电话语音性别检测方案[J].华南师范大学学报（自然科学版）,2019,51(6):118-122. 被引量：1
3杨汝丽.施甸话四音格词研究[J].保山学院学报,2020,39(3):67-71.
4耿浦洋,顾文涛.汉语男性话者性取向与语音基频关系的感知实验初探[J].南京师范大学文学院学报,2020(3):20-26. 被引量：1
5陈洁.初探诗与乐的结合之美——谈我校“诗·乐”课程开发与课堂实践[J].中小学音乐教育,2020,0(1):6-9. 被引量：1
6孙鹏霄,闫志明.LD 10-1油田不同韵律性厚油层剩余油富集规律[J].中外能源,2020,25(9):41-48. 被引量：5

南京航空航天大学学报

2020年第5期

浏览历史

内容加载中请稍等...

一种基于特征融合的耳语音向正常音的转换方法

参考文献4

二级参考文献19

共引文献31

相关作者

相关机构

相关主题

浏览历史