期刊文献+

基于CNN-BGRU的音素识别研究 被引量:1

Research of phoneme recognition based on the CNN-BGRU model
在线阅读 下载PDF
导出
摘要 音素是一个语言体系中最小的语音单位,音素识别在大词汇语音识别任务中不受词汇和语句的限制.因此,选择音素作为识别单元,建立基于CNN-BGRU的神经网络模型,实现音素语谱图的分类.首先,使用短时傅里叶变换生成音素语谱图作为模型的输入;其次建立CNN-BGRU模型,利用改进的VGGNet模型提取音素语谱图的特征,再使用双向门控循环单元(BGRU)实现音素语谱图的序列信息表示;最后,通过Softmax分类器实现音素语谱图的分类.实验使用TIMIT英语语音数据集进行音素语谱图识别,准确率达到98.6%,优于CNN(VGG16)、CNN-RNN、CNN-BRNN、CNN-BLSTM这4个模型. Phoneme is the smallest phonetic unit in a language system.Phoneme recognition is not restricted by words and sentences in the task of large vocabulary speech recognition.Therefore,in this paper,phoneme is selected as the recognition unit,and a neural network model based on CNN-BGRU is established to realize the classification of the phonemic spectrum.Firstly,the short-time Fourier transform is used to generate the phonemic spectrum as the input of the model.Secondly,the CNN-BGRU model is established to extract the features of the phonemic spectrum by using the improved VGGNet model,and then the sequence information representation of the phonemic spectrum is realized by using the two-way gated loop unit(BGRU).Finally,Softmax classifier is used to realize the classification of the phonemic spectrum.In the experiment,TIMIT English speech data set is used for phoneme recognition with an accuracy of 98.6%,which is better than CNN(VGG16),CNN-RNN,CNN-BRNN or CNN-BLSTM.
作者 和丽华 江涛 潘文林 杨皓然 HE Li-hua;YANG Hao-ran;JIANG Tao;PAN Wen-lin(School of Mathematics and Computer Science,Yunnan Minzu University,Kunming 650500,China)
出处 《云南民族大学学报(自然科学版)》 CAS 2020年第5期493-500,共8页 Journal of Yunnan Minzu University:Natural Sciences Edition
基金 国家自然科学基金(61363022)。
关键词 音素识别 卷积神经网络 双向门循环机制 phoneme recognition convolutional neural network BGRU
作者简介 和丽华(1995-),女,硕士研究生.主要研究方向:智能计算;通信作者:江涛(1973-),男,博士,教授,硕士生导师.主要研究方向:特定领域建模、建模语言形式化和验证.
  • 相关文献

参考文献4

二级参考文献14

  • 1Mohamed A, Sainath T N, Dahl G, et al. Deep Belief Networks Using Discriminative Features for Phone Recognition[ C ]// IEEE International Conference on Acoustic Speech and Signal Processing. 2011:5060-5063.
  • 2Bourlard H, Morgan N. Conneetionist Speech Recognition: A Hybrid Approach[ M]. Norwell, MA: Kluwer, 1993.
  • 3Ellis D P W, Singh R, Sivadas S. Tandem Acoustic Modeling in Large-Vocabulary Recognition[ C ]//IEEE International Con- ference on Acoustic Speech and Signal Processing. 2001:517-520.
  • 4Sainath T N, Kingsbury B, Ramabhadran B, et al. Making Deep Belief Network Effective For Large Vocabulary Continuous Speech Recognition[ C]//IEEE Automatic Speech Recognition and Understanding Workshop. 2011:30-35.
  • 5Dahl G, Yu D, Deng L, et al. Context-Dependent Pre-Trained Deep Neural Networks for Large Vocabulary Speech Recogni- tion[J].IEEE Trans. Audio, Speech, Lang. Process, 2012, 20(1):30-42.
  • 6Hinton G E. A Practical Guide to Training Restricted Boltzmann Machines[ EB/OL]. [ 2013-04-28 ]. http ://www. cs. toronto. edu/- hinton/absps/guideTR, pdf.
  • 7Bengio Y, Lamblin P, Popovici D, et al. Greedy layer-wise training of deep networks[ C ]// Advances in Neural Information Processing Systems. 2007,19 : 153-160.
  • 8Hinton G E. Training products of experts by minimizing contrastive divergence [ J ]. Neural Computation, 2002,14 ( 8 ) : 1711-1800.
  • 9Lee K F, Hon H W. Speaker-independent phone recognition using hidden Markov modets [ J ]. IEEE Trans. Acoustic, Speech, Signal Process, 1989,37 ( 11 ) : 1641-1648.
  • 10Sha F, Saul L. Large margin Gaussian mixture modeling for phonetic classification and recognition[ C ]//IEEE International Conference on Acoustic Speech and Signal Processing. 2006:265-268.

共引文献47

同被引文献2

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部