Aiming at the problem of music noise introduced by classical spectral subtraction,a shorttime modulation domain(STM)spectral subtraction method has been successfully applied for singlechannel speech enhancement.Howeve...Aiming at the problem of music noise introduced by classical spectral subtraction,a shorttime modulation domain(STM)spectral subtraction method has been successfully applied for singlechannel speech enhancement.However,due to the inaccurate voice activity detection(VAD),the residual music noise and enhanced performance still need to be further improved,especially in the low signal to noise ratio(SNR)scenarios.To address this issue,an improved frame iterative spectral subtraction in the STM domain(IMModSSub)is proposed.More specifically,with the inter-frame correlation,the noise subtraction is directly applied to handle the noisy signal for each frame in the STM domain.Then,the noisy signal is classified into speech or silence frames based on a predefined threshold of segmented SNR.With these classification results,a corresponding mask function is developed for noisy speech after noise subtraction.Finally,exploiting the increased sparsity of speech signal in the modulation domain,the orthogonal matching pursuit(OMP)technique is employed to the speech frames for improving the speech quality and intelligibility.The effectiveness of the proposed method is evaluated with three types of noise,including white noise,pink noise,and hfchannel noise.The obtained results show that the proposed method outperforms some established baselines at lower SNRs(-5 to +5 dB).展开更多
To overcome the limitations of conventional speech enhancement methods, such as inaccurate voice activity detector(VAD) and noise estimation, a novel speech enhancement algorithm based on the approximate message passi...To overcome the limitations of conventional speech enhancement methods, such as inaccurate voice activity detector(VAD) and noise estimation, a novel speech enhancement algorithm based on the approximate message passing(AMP) is adopted. AMP exploits the difference between speech and noise sparsity to remove or mute the noise from the corrupted speech. The AMP algorithm is adopted to reconstruct the clean speech efficiently for speech enhancement. More specifically, the prior probability distribution of speech sparsity coefficient is characterized by Gaussian-model, and the hyper-parameters of the prior model are excellently learned by expectation maximization(EM) algorithm. We utilize the k-nearest neighbor(k-NN) algorithm to learn the sparsity with the fact that the speech coefficients between adjacent frames are correlated. In addition, computational simulations are used to validate the proposed algorithm, which achieves better speech enhancement performance than other four baseline methods-Wiener filtering, subspace pursuit(SP), distributed sparsity adaptive matching pursuit(DSAMP), and expectation-maximization Gaussian-model approximate message passing(EM-GAMP) under different compression ratios and a wide range of signal to noise ratios(SNRs).展开更多
This paper presents a deep neural network(DNN)-based speech enhancement algorithm based on the soft audible noise masking for the single-channel wind noise reduction. To reduce the low-frequency residual noise, the ps...This paper presents a deep neural network(DNN)-based speech enhancement algorithm based on the soft audible noise masking for the single-channel wind noise reduction. To reduce the low-frequency residual noise, the psychoacoustic model is adopted to calculate the masking threshold from the estimated clean speech spectrum. The gain for noise suppression is obtained based on soft audible noise masking by comparing the estimated wind noise spectrum with the masking threshold. To deal with the abruptly time-varying noisy signals, two separate DNN models are utilized to estimate the spectra of clean speech and wind noise components. Experimental results on the subjective and objective quality tests show that the proposed algorithm achieves the better performance compared with the conventional DNN-based wind noise reduction method.展开更多
A method of single channel speech enhancement is proposed by de-noising using stationary wavelet transform. The approach developed herein processes multi-resolution wavelet coefficients individually and then recovery ...A method of single channel speech enhancement is proposed by de-noising using stationary wavelet transform. The approach developed herein processes multi-resolution wavelet coefficients individually and then recovery signal is reconstructed. The time invariant characteristics of stationary wavelet transform is particularly useful in speech de-noising. Experimental results show that the proposed speech enhancement by de-noising algorithm is possible to achieve an excellent balance between suppresses noise effectively and preserves as many target characteristics of original signal as possible. This de-noising algorithm offers a superior performance to speech signal noise suppress.展开更多
A noise estimator was presented in this paper by modeling the log-power sequence with hidden Markov model (HMM). The smoothing factor of this estimator was motivated by the speech presence probability at each freque...A noise estimator was presented in this paper by modeling the log-power sequence with hidden Markov model (HMM). The smoothing factor of this estimator was motivated by the speech presence probability at each frequency band. This HMM had a speech state and a nonspeech state, and each state consisted of a unique Gaussian function. The mean of the nonspeech state was the estimation of the noise logarithmic power. To make this estimator run in an on-line manner, an HMM parameter updated method was used based on a first-order recursive process. The noise signal was tracked together with the HMM to be sequentially updated. For the sake of reliability, some constraints were introduced to the HMM. The proposed algorithm was compared with the conventional ones such as minimum statistics (MS) and improved minima controlled recursive averaging (IM- CRA). The experimental results confirms its promising performance.展开更多
This study proposes a post-processor to improve the harmonic structure of a vowel in an enhanced speech, enabling the speech quality to be improved. Initially, a speech enhancement algorithm is employed to reduce the ...This study proposes a post-processor to improve the harmonic structure of a vowel in an enhanced speech, enabling the speech quality to be improved. Initially, a speech enhancement algorithm is employed to reduce the background noise for a noisy speech. Hence the enhanced speech is post-processed by a hybrid-median filter to reduce the musical effect of residual noise. Since the harmonic spectra are impacted by background noise and a speech enhancement process, the quality of a vowel is deteriorated. A harmonic regenerated method is developed to improve the quality of post-processed speech. Experimental results show that the proposed method can improve the quality of post-processed speech by adequately regenerating harmonic spectra.展开更多
The speech signal and noise signal are the typical non-stationary signals,however the speech signa is short-stationary synchronously.Presently,the denoising methods are always executed in frequency domain due to the s...The speech signal and noise signal are the typical non-stationary signals,however the speech signa is short-stationary synchronously.Presently,the denoising methods are always executed in frequency domain due to the short-time stationarity of the speech signal.In this article,an improved speech denoising algorithm based on discrete fractional Fourier transform(DFRFT)is pre sented.This algorithm contains linear optimal filtering and median filtering.The simulation shows that it can easily eliminate the noise compared to Wiener filtering improve the signal to noise ratio(SNR),and enhance the original speech signal.展开更多
基金National Natural Science Foundation of China(NSFC)(No.61671075)Major Program of National Natural Science Foundation of China(No.61631003)。
文摘Aiming at the problem of music noise introduced by classical spectral subtraction,a shorttime modulation domain(STM)spectral subtraction method has been successfully applied for singlechannel speech enhancement.However,due to the inaccurate voice activity detection(VAD),the residual music noise and enhanced performance still need to be further improved,especially in the low signal to noise ratio(SNR)scenarios.To address this issue,an improved frame iterative spectral subtraction in the STM domain(IMModSSub)is proposed.More specifically,with the inter-frame correlation,the noise subtraction is directly applied to handle the noisy signal for each frame in the STM domain.Then,the noisy signal is classified into speech or silence frames based on a predefined threshold of segmented SNR.With these classification results,a corresponding mask function is developed for noisy speech after noise subtraction.Finally,exploiting the increased sparsity of speech signal in the modulation domain,the orthogonal matching pursuit(OMP)technique is employed to the speech frames for improving the speech quality and intelligibility.The effectiveness of the proposed method is evaluated with three types of noise,including white noise,pink noise,and hfchannel noise.The obtained results show that the proposed method outperforms some established baselines at lower SNRs(-5 to +5 dB).
基金supported by National Natural Science Foundation of China(NSFC)(No.61671075)Major Program of National Natural Science Foundation of China(No.61631003)。
文摘To overcome the limitations of conventional speech enhancement methods, such as inaccurate voice activity detector(VAD) and noise estimation, a novel speech enhancement algorithm based on the approximate message passing(AMP) is adopted. AMP exploits the difference between speech and noise sparsity to remove or mute the noise from the corrupted speech. The AMP algorithm is adopted to reconstruct the clean speech efficiently for speech enhancement. More specifically, the prior probability distribution of speech sparsity coefficient is characterized by Gaussian-model, and the hyper-parameters of the prior model are excellently learned by expectation maximization(EM) algorithm. We utilize the k-nearest neighbor(k-NN) algorithm to learn the sparsity with the fact that the speech coefficients between adjacent frames are correlated. In addition, computational simulations are used to validate the proposed algorithm, which achieves better speech enhancement performance than other four baseline methods-Wiener filtering, subspace pursuit(SP), distributed sparsity adaptive matching pursuit(DSAMP), and expectation-maximization Gaussian-model approximate message passing(EM-GAMP) under different compression ratios and a wide range of signal to noise ratios(SNRs).
基金partially supported by the National Natural Science Foundation of China (Nos.11590772, 11590770)the Pre-research Project for Equipment of General Information System (No.JZX2017-0994/Y306)
文摘This paper presents a deep neural network(DNN)-based speech enhancement algorithm based on the soft audible noise masking for the single-channel wind noise reduction. To reduce the low-frequency residual noise, the psychoacoustic model is adopted to calculate the masking threshold from the estimated clean speech spectrum. The gain for noise suppression is obtained based on soft audible noise masking by comparing the estimated wind noise spectrum with the masking threshold. To deal with the abruptly time-varying noisy signals, two separate DNN models are utilized to estimate the spectra of clean speech and wind noise components. Experimental results on the subjective and objective quality tests show that the proposed algorithm achieves the better performance compared with the conventional DNN-based wind noise reduction method.
基金Supported by the Education Foundation of Anhui Province (No.2002kj003)
文摘A method of single channel speech enhancement is proposed by de-noising using stationary wavelet transform. The approach developed herein processes multi-resolution wavelet coefficients individually and then recovery signal is reconstructed. The time invariant characteristics of stationary wavelet transform is particularly useful in speech de-noising. Experimental results show that the proposed speech enhancement by de-noising algorithm is possible to achieve an excellent balance between suppresses noise effectively and preserves as many target characteristics of original signal as possible. This de-noising algorithm offers a superior performance to speech signal noise suppress.
基金Supported by the National Key Basic Research Program of China(2013CB329302)the National Natural Science Foundation of China(61271426,10925419,90920302,61072124,11074275,11161140319,91120001)+3 种基金the Strategic Priority Research Program of the Chinese Academy of Sciences(XDA06030100,XDA06030500)the National "863" Program(2012AA012503)the CAS Priority Deployment Project(KGZD-EW-103-2)Jiangxi Provincial Department of Education Science and Technology Project(GJJ13426)
文摘A noise estimator was presented in this paper by modeling the log-power sequence with hidden Markov model (HMM). The smoothing factor of this estimator was motivated by the speech presence probability at each frequency band. This HMM had a speech state and a nonspeech state, and each state consisted of a unique Gaussian function. The mean of the nonspeech state was the estimation of the noise logarithmic power. To make this estimator run in an on-line manner, an HMM parameter updated method was used based on a first-order recursive process. The noise signal was tracked together with the HMM to be sequentially updated. For the sake of reliability, some constraints were introduced to the HMM. The proposed algorithm was compared with the conventional ones such as minimum statistics (MS) and improved minima controlled recursive averaging (IM- CRA). The experimental results confirms its promising performance.
基金supported by the NCS under Grant No.NSC 102-2221-E-468-004
文摘This study proposes a post-processor to improve the harmonic structure of a vowel in an enhanced speech, enabling the speech quality to be improved. Initially, a speech enhancement algorithm is employed to reduce the background noise for a noisy speech. Hence the enhanced speech is post-processed by a hybrid-median filter to reduce the musical effect of residual noise. Since the harmonic spectra are impacted by background noise and a speech enhancement process, the quality of a vowel is deteriorated. A harmonic regenerated method is developed to improve the quality of post-processed speech. Experimental results show that the proposed method can improve the quality of post-processed speech by adequately regenerating harmonic spectra.
文摘The speech signal and noise signal are the typical non-stationary signals,however the speech signa is short-stationary synchronously.Presently,the denoising methods are always executed in frequency domain due to the short-time stationarity of the speech signal.In this article,an improved speech denoising algorithm based on discrete fractional Fourier transform(DFRFT)is pre sented.This algorithm contains linear optimal filtering and median filtering.The simulation shows that it can easily eliminate the noise compared to Wiener filtering improve the signal to noise ratio(SNR),and enhance the original speech signal.