摘要
探索将循环神经网络和连接时序分类算法应用于藏语语音识别声学建模,实现端到端的模型训练。同时根据声学模型输入与输出的关系,通过在隐含层输出序列上引入时域卷积操作来对网络隐含层时域展开步数进行约简,从而有效提升模型的训练与解码效率。实验结果显示,与传统基于隐马尔可夫模型的声学建模方法相比,循环神经网络模型在藏语拉萨话音素识别任务上具有更好的识别性能,而引入时域卷积操作的循环神经网络声学模型在保持同等识别性能的情况下,拥有更高的训练和解码效率。
The recurrent neural network and the connectionist temporal classification algorithm are applied to the acoustic modeling of Tibetan speech recognition,so as to achieve end-to-end model training.According to the relationship between the input and output of the acoustic model,the time domain convolution operation on the output sequence of the hidden layer is introduced to reduce the time domain expansion of the network's hidden layers.Experimental results show that the recurrent neural network model achieves better recognition performance in Tibetan Lhasa phoneme recognition compared with the traditional acoustic models based on Hidden Markov Model,while the acoustic model based on recurrent neural network with time-domain convolution possesses higher training and decoding efficiency while maintaining the same recognition performance.
作者
黄晓辉
李京
HUANG Xiaohui;LI Jing(College of Computer Science and Technology,University of Science and Technology of China,Hefei,Anhui 230027,China;PLA University of Foreign Language,Luoyang,Henan 471003,China)
出处
《中文信息学报》
CSCD
北大核心
2018年第5期49-55,共7页
Journal of Chinese Information Processing
基金
国家重点研发计划项目(2016YFB0201402)
关键词
循环神经网络
藏语语音识别
声学建模
时域卷积
recurrent neural network
Tibetan speech recognition
acoustic modeling
time domain convolution
作者简介
黄晓辉(1986-),博士,讲师,主要研究领域为深度学习、自然语言处理。E-mail:huangxia@mail.ustc.edu.cn;李京(1966-),博士,教授,主要研究领域为分布式算法、大数据处理。E-mail:lj@ustc.edu.cn