基于两步判决的口语中非文字音频事件检测方法被引量：1

Two-Stage Decision-Based Detection of Non-Lexical Audio Events in Spontaneous Vocalization

在线阅读下载PDF

导出

摘要为了有效利用非文字音频事件进行会话语音的语义分析,在分析口语中频繁出现的音频事件特征差异的基础上,提出了一种基于两步判决的口语中非文字音频事件的检测方法.该方法利用音频事件的信号特征构造音频事件信号段,采用门限判决来检测长掌声(第一步判决),而用统计模型来检测其它音频事件(第二步判决).实验结果表明:该方法检测填音、笑声、掌声3种非文字音频事件的平均准确率、召回率和F1度量值分别为87.3%、93.8%和90.4%;与现有文献数据相比,F1度量值平均提高了7.5%,且文中方法能更精确地确定非文字音频事件的边界. In order to effectively utilize non-lexical audio events to analyze the semantics of conversational speech,the characteristic differences among the audio events frequently occurring in spontaneous vocalization are analyzed,and a two-stage decision-based method to detect non-lexical audio events in spontaneous vocalization is proposed.In this method,the characteristics of audio events are used to construct signal segments of audio events： the thre-shold decision is used to detect longer applause（the first-stage decision）,and statistical models are employed to detect other audio events（the second-stage decision）.Experimental results show that the average precision,the recall rate and the F1-measure of the proposed method for three non-lexical audio events（i.e.filled pause,laughter and applause） are respectively 87.3%,93.8% and 90.4%;and that,as compared with the existing method,the proposed method is of an average F1-measure increase by 7.5% and it helps to determine the boundaries of non-lexical audio events with higher accuracy.

作者贺前华李艳雄李韬张虹杨继臣

机构地区华南理工大学电子与信息学院

出处《华南理工大学学报（自然科学版）》 EI CAS CSCD 北大核心 2011年第2期20-25,31,共7页 Journal of South China University of Technology(Natural Science Edition)

基金国家自然科学基金资助项目(60972132) 广东省自然科学基金资助项目(10451064101004651 9351064101000003)

关键词非文字音频事件门限判决统计模型检测口语语音语音处理 non-lexical audio event threshold decision statistical model detection spontaneous speech speech processing

分类号 TN912.3 [电子电信—通信与信息系统]

作者简介作者简介：贺前华（1965-），男，教授，博士生导师，主要从事语音及音频信号处理、嵌入式系统研究．E-mail：eeqhhe@scutedu．cn

引文网络
相关文献

参考文献12

1Stouten F,Duchateau J, Martens J P, et al. Coping with disfluencies in spontaneous speech recognition: acoustic detection and linguistic context manipulation [ J ]. Speech Communication ,2006,48 ( 11 ) : 1590-1606.
2Cai R,Lu L,Zhang H J,et al. Highlight sound effects detection in audio stream [C]//Proceedings of IEEE International Conference on Multimedia and Expo. Baltimore: IEEE, 2003 : 37-40.
3Kennedy L S, Ellis D P W. Laughter detection in meetings [ C ]//Proceedings of NIST International Conference on Acoustics, Speech and Signal Processing (Meeting Recognition Workshop ). Montreal: The National Institute of Standard and Technology ,2004:118-121.
4Knox M T, Mirghafori N. Automatic laughter detection using neural networks [ C ]//Proceedings of InterSpeech. Antwerpen:International Speech Communication Association, 2007:2973-2976.
5Laskowski K, Schuhz T. Detection of laughter-in-interaction in muhichannel close-talk microphone recordings of meetings [ C ] // Proceedings of the 5th International Workshop on Machine Learning for Muhimodal Interaction. Utrecht : Springer-Verlag ,2008 : 149-160.
6Knox M T, Morgan N, Mirghafori N. Getting the last laugh: automatic laughter segmentation in meetings [ C ]//Proceedings of InterSpeech. Brisbane: International Speech Communication Association,2008:797-800.
7Garg G, Ward N. Detecting filled pauses in tutorial dialogs [R]. EI Paso : Department of Computer Science, University of Texas at EI Paso,2006 : 1-9.
8Audhkhasi K, Kandhway K, Deshmukh O D, et al. Formant-based technique for automatic filled-pause detection in spontaneous spoken English [C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Taipei : IEEE, 2009:4857-4860.
9Li Y X, He Q H, Kwong S, et al. Characteristics-based effective applause detection for meeting speech [ J ]. Signal Processing ,2009,89 ( 8 ) : 1625-1633.
10Carter A. Automatic acoustic laughter detection [ D ]. Staffordshire: Department of Electronic Engineering, Keele Universtiy ,2000.

二级参考文献13

1Rose R C and Riccardi G. Modeling disfluency and background events in ASR for a natural language understanding task. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Phoenix, AZ, USA, March 15-19, 1999, vol.1: 341-344.
2Stouten F, Duchateau J, and Martens J P, et al.. Coping with disfluencies in spontaneous speech recognition: acoustic detection and linguistic context manipulation. SpeechCommunication, 2006, vol.48: 1590-1606.
3Chinese Linguistic Data Consortium. http://www.chineseldc. org/resourse.asp.
4Cal R, Lie L, Zhang H J, and Cai L H. Highlight sound effects detection in audio stream. In Proc. of the IEEE International Conference on Multimedia and Expo, Baltimore, USA, July 6-9, 2003, Vol.3: 37-40.
5Lockerd A and Mueller F. LAFCam Leveraging affective feedback camcorder. In Proc. of the CHI 2002 Conference on Human Factors in Computing Systems, Minneapolis, USA, 2002: 574-575.
6Kennedy L S and Ellis D P W. Laughter detection in meetings. In NIST ICASSP 2004 Meeting Recognition Workshop, Montreal, Canada, 2004: 11-14.
7Ito A, Wang Xinyue, Suzuki M, and Makino S. Smile and laughter recognition using speech processing and face recognition from conversation video. In Proc. of the 2005 International Conference on Cyberworlds, Nanyang Executive Centre, Singapore, November 23-25, 2005: 437-444.
8Truong K P and van Leeuwen D A. Automatic discrimination between laughter and speech. Speech Communication,2007, 49(2): 144-158.
9Hermansky H. Perceptual linear predictive (PLP) analysis of speech. Journal of Acoustic society of America, 1990, 87(4): 1738-1752.
10Sun Xuejing. Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Florida, USA, May 2002, Vol.l: 333-336.

共引文献2

1赵小蕾,毛启容,詹永照.融合功能性副语言的语音情感识别新方法[J].计算机科学与探索,2014,8(2):186-199. 被引量：6
2朱春媚,黎萍.基于频段互相关系数的咳嗽识别新方法[J].计算机工程与应用,2016,52(2):161-164. 被引量：1

同被引文献1

1罗森林,王坤,谢尔曼,潘丽敏,李金玉.融合GMM及SVM的特定音频事件高精度识别方法[J].北京理工大学学报,2014,34(7):716-722. 被引量：5

引证文献1

1李艳雄,王琴,张雪,邹领.基于凝聚信息瓶颈的音频事件聚类方法[J].电子学报,2017,45(5):1064-1071. 被引量：7

二级引证文献7

1宋天祥,杨明锦,杨林顺,张明明,彭晨.基于谱聚类分析的托辊故障诊断[J].电子测量技术,2019,42(5):144-150. 被引量：12
2李应,印佳丽.基于多随机森林的低信噪比声音事件检测[J].电子学报,2018,46(11):2705-2713. 被引量：6
3LI Ying,HUANG Hongkeng,WU Zhibin.Animal Sound Recognition Based on Double Feature of Spectrogram[J].Chinese Journal of Electronics,2019,28(4):667-673.
4苏映新.自适应粒子群优化匹配追踪声音事件识别算法[J].激光与光电子学进展,2020,57(10):285-291. 被引量：4
5王雪蓉,万年红.云模式事件混沌关联特征提取的物联网大数据聚类算法[J].计算机应用研究,2021,38(2):391-397. 被引量：10
6张聿晗,李艳雄,江钟杰,陈昊.基于联合学习框架的音频场景聚类[J].电子学报,2021,49(10):2041-2047.
7万年红,王雪蓉.多目标事务模糊关联聚类的云制造服务组合算法[J].计算机测量与控制,2022,30(6):287-295. 被引量：2

1刘芳,张玉明,张义门,郭辉.离子注入制备n型SiC欧姆接触的工艺研究[J].半导体技术,2005,30(4):24-28. 被引量：2
2张研,邱天爽,任福全.分布式光纤中的稳健时间延迟估计方法[J].应用光学,2012,33(4):815-820. 被引量：1
3陈怡君,管桦,王国正,张群,罗迎.稀疏孔径条件下微动目标特征提取与成像算法[J].现代防御技术,2014,42(4):136-142. 被引量：1
4叶远国.针对自发性语音的识别方案[J].电子测试,2015,26(2X):36-38.
5郎为民,杨德鹏,李虎生.无线认知传感器网络研究[J].数据通信,2012(1):16-19. 被引量：3
6赵洁,马铮,周晓峰,金培权.基于突发词项频域分析的微博突发事件检测[J].情报理论与实践,2015,38(1):124-129. 被引量：11
7向兵,侯卫周.一种新的高电子迁移率晶体管I-V解析模型[J].郑州大学学报（工学版）,2008,29(3):31-34.
8陈巍,杨林,杨华庆.石油套管生产车间的数据通信抗干扰设计[J].机械工程与自动化,2009(6):185-186.
9常怀秋,贺蒙,朴玲钰.扫描电子显微镜测长标准物质:线宽标物与粒度标物[J].中国计量,2009(5):79-81.
10李健,赖平.埋地管道典型异常事件信号特征提取方法研究[J].传感技术学报,2010,23(7):968-972. 被引量：8

华南理工大学学报（自然科学版）

2011年第2期

浏览历史

内容加载中请稍等...

基于两步判决的口语中非文字音频事件检测方法被引量：1

参考文献12

二级参考文献13

共引文献2

同被引文献1

引证文献1

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于两步判决的口语中非文字音频事件检测方法 被引量：1

参考文献12

二级参考文献13

共引文献2

同被引文献1

引证文献1

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于两步判决的口语中非文字音频事件检测方法被引量：1