摘要
音音素是自然语言中的最小建模单元,音素识别模型的优劣直接影响关键词检索、连续语音识别的性能。本文首先针对幅度特征MSRCC和相位特征PSRCC进行了一系列对比实验研究,发现融合幅度特征和相位特征可以取得更好的识别效果;接着比较分析了几种深度神经网络的优缺点,并将它们用于音素识别,仿真实验表明基于BLSTM-CTC的声学模型相比于其他模型具有更好的识别性能。
Phoneme is the smallest modeling unit in natural language,and the quality of phoneme recognition model directly affects the performance of keyword retrieval and continuous speech recognition.This paper firstly conducts a series of comparative experimental studies on the amplitude feature MSRCC and the phase feature PSRCC,and finds that the fusion of the amplitude fea⁃ture and the phase feature can achieve better recognition results;For phoneme recognition,simulation experiments show that the acoustic model based on BLSTM-CTC has better recognition performance than other models.
作者
吴丹丹
夏秀渝
Wu Dandan;Xia Xiuyu(College of Electronic and Information Engineering,Sichuan University,Chengdu 610065)
出处
《现代计算机》
2022年第10期32-38,共7页
Modern Computer
关键词
音素识别
深度神经网络
语音特征
phoneme recognition
deep neural network
acoustic characteristics
作者简介
吴丹丹(1997-),女,四川广元人,硕士,学生,研究方向为语音信号处理、语音识别;夏秀渝(1970-),女,四川成都人,博士,副教授,研究方向为语音信号处理。