期刊文献+

基于两步判决的口语中非文字音频事件检测方法 被引量:1

Two-Stage Decision-Based Detection of Non-Lexical Audio Events in Spontaneous Vocalization
在线阅读 下载PDF
导出
摘要 为了有效利用非文字音频事件进行会话语音的语义分析,在分析口语中频繁出现的音频事件特征差异的基础上,提出了一种基于两步判决的口语中非文字音频事件的检测方法.该方法利用音频事件的信号特征构造音频事件信号段,采用门限判决来检测长掌声(第一步判决),而用统计模型来检测其它音频事件(第二步判决).实验结果表明:该方法检测填音、笑声、掌声3种非文字音频事件的平均准确率、召回率和F1度量值分别为87.3%、93.8%和90.4%;与现有文献数据相比,F1度量值平均提高了7.5%,且文中方法能更精确地确定非文字音频事件的边界. In order to effectively utilize non-lexical audio events to analyze the semantics of conversational speech,the characteristic differences among the audio events frequently occurring in spontaneous vocalization are analyzed,and a two-stage decision-based method to detect non-lexical audio events in spontaneous vocalization is proposed.In this method,the characteristics of audio events are used to construct signal segments of audio events: the thre-shold decision is used to detect longer applause(the first-stage decision),and statistical models are employed to detect other audio events(the second-stage decision).Experimental results show that the average precision,the recall rate and the F1-measure of the proposed method for three non-lexical audio events(i.e.filled pause,laughter and applause) are respectively 87.3%,93.8% and 90.4%;and that,as compared with the existing method,the proposed method is of an average F1-measure increase by 7.5% and it helps to determine the boundaries of non-lexical audio events with higher accuracy.
出处 《华南理工大学学报(自然科学版)》 EI CAS CSCD 北大核心 2011年第2期20-25,31,共7页 Journal of South China University of Technology(Natural Science Edition)
基金 国家自然科学基金资助项目(60972132) 广东省自然科学基金资助项目(10451064101004651 9351064101000003)
关键词 非文字音频事件 门限判决 统计模型检测 口语语音 语音处理 non-lexical audio event threshold decision statistical model detection spontaneous speech speech processing
作者简介 作者简介:贺前华(1965-),男,教授,博士生导师,主要从事语音及音频信号处理、嵌入式系统研究.E-mail:eeqhhe@scutedu.cn
  • 相关文献

参考文献12

  • 1Stouten F,Duchateau J, Martens J P, et al. Coping with disfluencies in spontaneous speech recognition: acoustic detection and linguistic context manipulation [ J ]. Speech Communication ,2006,48 ( 11 ) : 1590-1606.
  • 2Cai R,Lu L,Zhang H J,et al. Highlight sound effects detection in audio stream [C]//Proceedings of IEEE International Conference on Multimedia and Expo. Baltimore: IEEE, 2003 : 37-40.
  • 3Kennedy L S, Ellis D P W. Laughter detection in meetings [ C ]//Proceedings of NIST International Conference on Acoustics, Speech and Signal Processing (Meeting Recognition Workshop ). Montreal: The National Institute of Standard and Technology ,2004:118-121.
  • 4Knox M T, Mirghafori N. Automatic laughter detection using neural networks [ C ]//Proceedings of InterSpeech. Antwerpen:International Speech Communication Association, 2007:2973-2976.
  • 5Laskowski K, Schuhz T. Detection of laughter-in-interaction in muhichannel close-talk microphone recordings of meetings [ C ] // Proceedings of the 5th International Workshop on Machine Learning for Muhimodal Interaction. Utrecht : Springer-Verlag ,2008 : 149-160.
  • 6Knox M T, Morgan N, Mirghafori N. Getting the last laugh: automatic laughter segmentation in meetings [ C ]//Proceedings of InterSpeech. Brisbane: International Speech Communication Association,2008:797-800.
  • 7Garg G, Ward N. Detecting filled pauses in tutorial dialogs [R]. EI Paso : Department of Computer Science, University of Texas at EI Paso,2006 : 1-9.
  • 8Audhkhasi K, Kandhway K, Deshmukh O D, et al. Formant-based technique for automatic filled-pause detection in spontaneous spoken English [C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Taipei : IEEE, 2009:4857-4860.
  • 9Li Y X, He Q H, Kwong S, et al. Characteristics-based effective applause detection for meeting speech [ J ]. Signal Processing ,2009,89 ( 8 ) : 1625-1633.
  • 10Carter A. Automatic acoustic laughter detection [ D ]. Staffordshire: Department of Electronic Engineering, Keele Universtiy ,2000.

二级参考文献13

  • 1Rose R C and Riccardi G. Modeling disfluency and background events in ASR for a natural language understanding task. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Phoenix, AZ, USA, March 15-19, 1999, vol.1: 341-344.
  • 2Stouten F, Duchateau J, and Martens J P, et al.. Coping with disfluencies in spontaneous speech recognition: acoustic detection and linguistic context manipulation. SpeechCommunication, 2006, vol.48: 1590-1606.
  • 3Chinese Linguistic Data Consortium. http://www.chineseldc. org/resourse.asp.
  • 4Cal R, Lie L, Zhang H J, and Cai L H. Highlight sound effects detection in audio stream. In Proc. of the IEEE International Conference on Multimedia and Expo, Baltimore, USA, July 6-9, 2003, Vol.3: 37-40.
  • 5Lockerd A and Mueller F. LAFCam Leveraging affective feedback camcorder. In Proc. of the CHI 2002 Conference on Human Factors in Computing Systems, Minneapolis, USA, 2002: 574-575.
  • 6Kennedy L S and Ellis D P W. Laughter detection in meetings. In NIST ICASSP 2004 Meeting Recognition Workshop, Montreal, Canada, 2004: 11-14.
  • 7Ito A, Wang Xinyue, Suzuki M, and Makino S. Smile and laughter recognition using speech processing and face recognition from conversation video. In Proc. of the 2005 International Conference on Cyberworlds, Nanyang Executive Centre, Singapore, November 23-25, 2005: 437-444.
  • 8Truong K P and van Leeuwen D A. Automatic discrimination between laughter and speech. Speech Communication,2007, 49(2): 144-158.
  • 9Hermansky H. Perceptual linear predictive (PLP) analysis of speech. Journal of Acoustic society of America, 1990, 87(4): 1738-1752.
  • 10Sun Xuejing. Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Florida, USA, May 2002, Vol.l: 333-336.

共引文献2

同被引文献1

引证文献1

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部