A novel technique is proposed to improve the performance of voice activity detection(VAD) by using deep belief networks(DBN) with a likelihood ratio(LR). The likelihood ratio is derived from the speech and noise spect...A novel technique is proposed to improve the performance of voice activity detection(VAD) by using deep belief networks(DBN) with a likelihood ratio(LR). The likelihood ratio is derived from the speech and noise spectral components that are assumed to follow the Gaussian probability density function(PDF). The proposed algorithm employs DBN learning in order to classify voice activity by using the input signal to calculate the likelihood ratio. Experiments show that the proposed algorithm yields improved results in various noise environments, compared to the conventional VAD algorithms. Furthermore, the DBN based algorithm decreases the detection probability of error with [0.7, 2.6] compared to the support vector machine based algorithm.展开更多
In this work, a novel voice activity detection (VAD) algorithm that uses speech absence probability (SAP) based on Teager energy (TE) was proposed for speech enhancement. The proposed method employs local SAP (...In this work, a novel voice activity detection (VAD) algorithm that uses speech absence probability (SAP) based on Teager energy (TE) was proposed for speech enhancement. The proposed method employs local SAP (LSAP) based on the TE of noisy speech as a feature parameter for voice activity detection (VAD) in each frequency subband, rather than conventional LSAP. Results show that the TE operator can enhance the abiTity to discriminate speech and noise and further suppress noise components. Therefore, TE-based LSAP provides a better representation of LSAP, resulting in improved VAD for estimating noise power in a speech enhancement algorithm. In addition, the presented method utilizes TE-based global SAP (GSAP) derived in each frame as the weighting parameter for modifying the adopted TE operator and improving its performance. The proposed algorithm was evaluated by objective and subjective quality tests under various environments, and was shown to produce better results than the conventional method.展开更多
在频域应用高阶统计量(High order statistics,HOS),提出一种基于幅度谱HOS新特征的语音端点检测(Voice activity detection,VAD)算法。算法利用相邻帧获取当前帧的统计信息,并用幅度谱构造独立零均值高斯随机序列,通过计算此序列的归...在频域应用高阶统计量(High order statistics,HOS),提出一种基于幅度谱HOS新特征的语音端点检测(Voice activity detection,VAD)算法。算法利用相邻帧获取当前帧的统计信息,并用幅度谱构造独立零均值高斯随机序列,通过计算此序列的归一化偏度来得到HOS特征。新特征利用了噪声的长时平稳特性和无序性的先验信息,借用语音生成模型来分析噪声模型,并通过合理的假定,提取潜藏在幅度谱中的高斯信息。因此相比传统HOS特征只能用于高斯或准高斯白噪声检测,幅度谱HOS适用范围扩展到包括有色噪声在内的所有平稳随机噪声。同时新特征表现出许多优异的特性,如:平稳噪声的特征值趋近于零;语音间隙噪声段和语音结束时呈现出负峰特性等。利用这些特性可以建立适用于不同类型、不同信噪比、且具有随机切入点的强鲁棒性能的VAD算法。文章详细阐述了新特征的原理以及特性,并结合判决准则构造了一个简单的VAD算法。实验结果表明,对于平稳噪声基于幅度谱HOS的VAD算法,在检测的准确性和算法鲁棒性的综合性能上优于基于传统特征的算法。展开更多
基金supported by the KERI Primary Research Program through the Korea Research Council for Industrial Science & Technology funded by the Ministry of Science,ICT and Future Planning (No.15-12-N0101-46)
文摘A novel technique is proposed to improve the performance of voice activity detection(VAD) by using deep belief networks(DBN) with a likelihood ratio(LR). The likelihood ratio is derived from the speech and noise spectral components that are assumed to follow the Gaussian probability density function(PDF). The proposed algorithm employs DBN learning in order to classify voice activity by using the input signal to calculate the likelihood ratio. Experiments show that the proposed algorithm yields improved results in various noise environments, compared to the conventional VAD algorithms. Furthermore, the DBN based algorithm decreases the detection probability of error with [0.7, 2.6] compared to the support vector machine based algorithm.
基金Project supported by Inha University Research GrantProject(10031764) supported by the Strategic Technology Development Program of Ministry of Knowledge Economy, Korea
文摘In this work, a novel voice activity detection (VAD) algorithm that uses speech absence probability (SAP) based on Teager energy (TE) was proposed for speech enhancement. The proposed method employs local SAP (LSAP) based on the TE of noisy speech as a feature parameter for voice activity detection (VAD) in each frequency subband, rather than conventional LSAP. Results show that the TE operator can enhance the abiTity to discriminate speech and noise and further suppress noise components. Therefore, TE-based LSAP provides a better representation of LSAP, resulting in improved VAD for estimating noise power in a speech enhancement algorithm. In addition, the presented method utilizes TE-based global SAP (GSAP) derived in each frame as the weighting parameter for modifying the adopted TE operator and improving its performance. The proposed algorithm was evaluated by objective and subjective quality tests under various environments, and was shown to produce better results than the conventional method.
文摘在频域应用高阶统计量(High order statistics,HOS),提出一种基于幅度谱HOS新特征的语音端点检测(Voice activity detection,VAD)算法。算法利用相邻帧获取当前帧的统计信息,并用幅度谱构造独立零均值高斯随机序列,通过计算此序列的归一化偏度来得到HOS特征。新特征利用了噪声的长时平稳特性和无序性的先验信息,借用语音生成模型来分析噪声模型,并通过合理的假定,提取潜藏在幅度谱中的高斯信息。因此相比传统HOS特征只能用于高斯或准高斯白噪声检测,幅度谱HOS适用范围扩展到包括有色噪声在内的所有平稳随机噪声。同时新特征表现出许多优异的特性,如:平稳噪声的特征值趋近于零;语音间隙噪声段和语音结束时呈现出负峰特性等。利用这些特性可以建立适用于不同类型、不同信噪比、且具有随机切入点的强鲁棒性能的VAD算法。文章详细阐述了新特征的原理以及特性,并结合判决准则构造了一个简单的VAD算法。实验结果表明,对于平稳噪声基于幅度谱HOS的VAD算法,在检测的准确性和算法鲁棒性的综合性能上优于基于传统特征的算法。