期刊文献+

基于深层神经网络的藏语识别 被引量:14

Speech Recognition Based on Deep Neural Networks on Tibetan Corpus
在线阅读 下载PDF
导出
摘要 文中首次涉及藏语的自然对话风格大词汇电话连续语音识别问题.作为一种少数民族语言,藏语识别面临的最大的困难是数据稀疏问题.文中在基于深层神经网络(DNN)的声学模型建模中,针对数据稀疏的问题,提出采用大语种数据训练好的DNN作为目标模型的初始网络进行模型优化的策略.另外,由于藏语语音学的研究很不完善,人工生成决策树问题集的方式并不可行.针对该问题,文中利用数据驱动的方式自动生成决策树问题集,对三音子隐马尔可夫模型(HMM)进行状态绑定,从而减少需要估计的模型参数.在测试集上,基于混合高斯模型(GMM)声学建模的藏字识别率为30.86%.在基于DNN的声学模型建模中,采用三种大语种数据训练好的DNN网络作为初始网络,并在测试集上验证该方法的有效性,藏字识别正确率达到43.26%. Large vocabulary continuous speech recognition on telephonic conversational Tibetan is firstly addressed in this paper. As a minority language , the major difficulty in Tibetan speech recognition is data deficiency. In this paper, the acoustic model of Tibetan is trained based on deep neural networks (DNN). To address the issue of data deficiencies, the DNN models of other majority languages are used as the initial networks of the objective Tibetan DNN model. In addition, phonetic questions of Tibetan generated by phonetic expert are unavailable due to the lacking knowledge of phonetics. To reduce the number of tri-phone hidden Markov models (HMM) in Tibetan speech recognition, phonetic questions automatically generated in the data driven manner are used for tying the tri-phone HMM. In this paper, different clustering of tri-phone states is tested and the words accuracy is about 30. 86% on the test corpus by Gaussian mixture model (GMM). When the acoustic model is trained based on DNN, 3 kinds of DNN model trained by different large corpus are adopted. The experimental results show that the proposed methods can improve the reeogn/tion performance, and the words accuracy is about 43.26% on the test corpus.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2015年第3期209-213,共5页 Pattern Recognition and Artificial Intelligence
基金 国家自然科学基金项目(No.61273264)资助
关键词 藏语 连续语音识别 数据驱动 深层神经网络(DNN) Tibetan, Continuous Speech Recognition, Data Driven, Deep Neural Networks(DNN)
作者简介 袁胜龙(通讯作者),男,1989年生,硕士研究生,主要研究方向为语音识别.E-mail:slyuan@mail.ustc.edu.cn. 郭武,男,1973年生,博士,副教授,主要研究方向为说话人识别与确认、语音识别. 戴礼荣,男,1962年生,博士,教授,主要研究方向为语音识别、语音信号处理.
  • 相关文献

参考文献12

  • 1姚徐,李永宏,单广荣,于洪志.藏语孤立词语音识别系统研究[J].西北民族大学学报(自然科学版),2009,30(1):29-36. 被引量:10
  • 2韩清华,于洪志.基于HMM的安多藏语非特定人孤立词语音识别研究[J].软件导刊,2010,9(7):173-175. 被引量:9
  • 3李冠宇,孟猛.藏语拉萨话大词表连续语音识别声学模型研究[J].计算机工程,2012,38(5):189-191. 被引量:16
  • 4Dahl G E, Yu D, Deng L, et al. Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition.IEEE Trans on Audio, Speech, and Language Processing, 2012, 20 ( 1 ) : 30-42.
  • 5Hinton G E, Osindero S, Teh Y W. A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 2006, 18(7) : 1527-1554.
  • 6Beulen K, Ney H. Automatic Question Generation for Decision Tree Based State Tying//Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Seattle, USA, 1998, II: 805 -805.
  • 7Singh R, Raj B, Stern R M. Automatic Clustering and Generation of Contextual Questions for Tied States in Hidden Markov Models // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Phoenix, USA, 1999, I: 117-120.
  • 8Huang J T, Li J Y, Yu D, et al. Cross-Language Knowledge Trans- fer Using Muhilingual Deep Neural Network with Shared Hidden Layers//Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada, 2013 : 7304- 7308.
  • 9Carteira-Perpinan M A, Hinton G E. On Contrastive Divergence Learning. [ EB/OL ]. [ 2013 - 02 - 15 ]. www. doein, com/p - 33657so63. html.
  • 10Mohamed A, Dahl G E, Hinton G. Acoustic Modeling Using Deep Belief Networks. IEEE Trans on Audio, Speech, and Language Processing, 2012, 20( 1 ) : 14-22.

二级参考文献14

共引文献26

同被引文献69

  • 1德庆卓玛.藏语语音识别研究综述[J].西藏大学学报(社会科学版),2010,25(S1):192-195. 被引量:6
  • 2王国胜.核函数的性质及其构造方法[J].计算机科学,2006,33(6):172-174. 被引量:53
  • 3严斌峰,朱小燕,张智江,张范.语音识别确认中的置信特征和判定算法[J].软件学报,2006,17(12):2547-2553. 被引量:3
  • 4NASERSHARIF B, AKBARI A. SNR-dependent compression of enhanced Mel subband energies for compensation of noise effects on MFCC features [J]. Pattern Recognition Letters, 2011, 28 (11) : 1320- 1326.
  • 5POVEY D, KINGSBURY B, MANGU L, et al. fMPE: Discriminatively trained features for speech recognition [C]///Proceedings of the International Con- ference on Audio, Speech and Signal Processing. Pis- cataway, NJ, USA: IEEE, 2005: 961-964.
  • 6ZHANG B, MATSOUKAS S, SCHWARTZ R. Re- cent progress on the discriminative region-dependent transform for speech feature extraction [C] // Proceed- ings of the Annual Conference of International Speech Communication Association. Baixs, France: ISCA, 2006: 1495-1498.
  • 7YAN Z, HUO Q, XU J, et al. Tied-state based dis- criminative training of context-expanded region- dependent feature transforms for LVCSR [C]//Pro- ceedings of the International Conference on Audio, Speech and Signal Processing. Piscataway, N J, USA.. IEEE, 2013: 6940-6944.
  • 8SAINATH T N, KINGSBURY B, RAMABHAD- RAN B. Auto-encoder bottleneck features using deep belief networks [C]// Proceedings of the International Conference on Audio, Speech and Signal Processing. Piseataway, NJ, USA: IEEE, 2012: 4153-4156.
  • 9SAON G, KINGSBURY B. Discriminative feature- space transforms using deep neural networks [C]// Proceedings of the Annual Conference of International Speech Communication Association. Baixs, France: ISCA, 2012: 14-17.
  • 10PAULIK M. Lattice-based training of bottleneck fea- ture extraction neural networks [ C] /// Proceedings of the Annual Conference of International Speech Com- munication Association. Baixs, France: ISCA, 2013: 89-93.

引证文献14

二级引证文献71

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部