摘要
                
                    为了有效利用非文字音频事件进行会话语音的语义分析,在分析口语中频繁出现的音频事件特征差异的基础上,提出了一种基于两步判决的口语中非文字音频事件的检测方法.该方法利用音频事件的信号特征构造音频事件信号段,采用门限判决来检测长掌声(第一步判决),而用统计模型来检测其它音频事件(第二步判决).实验结果表明:该方法检测填音、笑声、掌声3种非文字音频事件的平均准确率、召回率和F1度量值分别为87.3%、93.8%和90.4%;与现有文献数据相比,F1度量值平均提高了7.5%,且文中方法能更精确地确定非文字音频事件的边界.
                
                In order to effectively utilize non-lexical audio events to analyze the semantics of conversational speech,the characteristic differences among the audio events frequently occurring in spontaneous vocalization are analyzed,and a two-stage decision-based method to detect non-lexical audio events in spontaneous vocalization is proposed.In this method,the characteristics of audio events are used to construct signal segments of audio events: the thre-shold decision is used to detect longer applause(the first-stage decision),and statistical models are employed to detect other audio events(the second-stage decision).Experimental results show that the average precision,the recall rate and the F1-measure of the proposed method for three non-lexical audio events(i.e.filled pause,laughter and applause) are respectively 87.3%,93.8% and 90.4%;and that,as compared with the existing method,the proposed method is of an average F1-measure increase by 7.5% and it helps to determine the boundaries of non-lexical audio events with higher accuracy.
    
    
    
    
                出处
                
                    《华南理工大学学报(自然科学版)》
                        
                                EI
                                CAS
                                CSCD
                                北大核心
                        
                    
                        2011年第2期20-25,31,共7页
                    
                
                    Journal of South China University of Technology(Natural Science Edition)
     
            
                基金
                    国家自然科学基金资助项目(60972132)
                    广东省自然科学基金资助项目(10451064101004651
                    9351064101000003)
            
    
                关键词
                    非文字音频事件
                    门限判决
                    统计模型检测
                    口语语音
                    语音处理
                
                        non-lexical audio event
                         threshold decision
                         statistical model detection
                         spontaneous speech
                         speech processing
                
     
    
    
                作者简介
作者简介:贺前华(1965-),男,教授,博士生导师,主要从事语音及音频信号处理、嵌入式系统研究.E-mail:eeqhhe@scutedu.cn