摘要
自动语音识别系统在噪声环境下的性能通常会显著下降,这成为制约语音识别技术广泛应用的一个重大障碍。该文在他人的基于Gammatone的听觉特征(GFCC特征)研究基础上,进一步对GFCC与基于Mel频率的倒谱系数(MFCC)在不同噪声环境下的性能表现进行分析研究。选择5种人工和自然噪声进行比较试验:白噪声、粉红噪声、褐色噪声、背景说话人噪声、汽车噪声。通过混合不同类型和不同强度的噪声,系统地研究了基于听觉特性的GFCC特征的特性和抗噪能力;特别地,用不同频段的正弦波噪声与纯净语音混合,分析了GFCC和MFCC在各个频带上的噪声鲁棒性。研究发现,与传统的MFCC相比,GFCC对低频噪声具有更高的鲁棒性,而对中高频噪声相对敏感。由于人类发音通常在较低频率(300~700Hz),这一特性使得GFCC在语音识别任务中具有良好的抗噪能力。实验结果表明,GFCC在多种常见噪声环境下都取得了比MFCC更好的识别效果,特别是在低信噪比的情况下表现出更大的优势。
A particular difficulty of automatic speech .recognition in real applications involves significant performance degradation in noisy environment. Based on the research on gammatone-based auditory features (GFCCs) proposed by other researchers, an additional comparative study on the GFCC and the MFCC was presented for various noise conditions. Particularly, the behavior of GFCC/MFCC features with noise in different frequency bands was analyzed by mixing the test speech with sine noises to show that the GFCC is more robust against low-frequency noises than the MFCCwhile more sensitive to noises at middle and high frequencies. This property is desirable for speech recognition since most of the information of human speech resides in the low frequency band of 300--700 Hz. Experimental results demonstrate that the GFCC exhibits significant advantages over the MFCC for various noise conditions, especially when the SNR is low.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2013年第8期1082-1086,共5页
Journal of Tsinghua University(Science and Technology)
作者简介
李银国(1955-),男(汉),湖北,教授。E-mail:liyg@cqupt.edu.cn