Perceptual auditory filter banks such as Bark-scale filter bank are widely used as front-end processing in speech recognition systems.However,the problem of the design of optimized filter banks that provide higher acc...Perceptual auditory filter banks such as Bark-scale filter bank are widely used as front-end processing in speech recognition systems.However,the problem of the design of optimized filter banks that provide higher accuracy in recognition tasks is still open.Owing to spectral analysis in feature extraction,an adaptive bands filter bank (ABFB) is presented.The design adopts flexible bandwidths and center frequencies for the frequency responses of the filters and utilizes genetic algorithm (GA) to optimize the design parameters.The optimization process is realized by combining the front-end filter bank with the back-end recognition network in the performance evaluation loop.The deployment of ABFB together with zero-crossing peak amplitude (ZCPA) feature as a front process for radial basis function (RBF) system shows significant improvement in robustness compared with the Bark-scale filter bank.In ABFB,several sub-bands are still more concentrated toward lower frequency but their exact locations are determined by the performance rather than the perceptual criteria.For the ease of optimization,only symmetrical bands are considered here,which still provide satisfactory results.展开更多
To enhance the speech quality that is degraded by environmental noise,an algorithm was proposed to reduce the noise and reinforce the speech.The minima controlled recursive averaging(MCRA) algorithm was used to estima...To enhance the speech quality that is degraded by environmental noise,an algorithm was proposed to reduce the noise and reinforce the speech.The minima controlled recursive averaging(MCRA) algorithm was used to estimate the noise spectrum and the partial masking effect which is one of the psychoacoustic properties was introduced to reinforce speech.The performance evaluation was performed by comparing the PESQ(perceptual evaluation of speech quality) and segSNR(segmental signal to noise ratio) by the proposed algorithm with the conventional algorithm.As a result,average PESQ by the proposed algorithm was higher than the average PESQ by the conventional noise reduction algorithm and segSNR was higher as much as 3.2 dB in average than that of the noise reduction algorithm.展开更多
In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language proc...In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language processing. The speaker independently continuous speech recognition experiments and the part-of-speech tagging experiments show that Markov family model has higher performance than hidden Markov model. The precision is enhanced from 94.642% to 96.214% in the part-of-speech tagging experiments, and the work rate is reduced by 11.9% in the speech recognition experiments with respect to HMM baseline system.展开更多
Enhanced speech based on the traditional wavelet threshold function had auditory oscillation distortion and the low signal-to-noise ratio (SNR). In order to solve these problems, a new continuous differentiable thresh...Enhanced speech based on the traditional wavelet threshold function had auditory oscillation distortion and the low signal-to-noise ratio (SNR). In order to solve these problems, a new continuous differentiable threshold function for speech enhancement was presented. Firstly, the function adopted narrow threshold areas, preserved the smaller signal speech, and improved the speech quality; secondly, based on the properties of the continuous differentiable and non-fixed deviation, each area function was attained gradually by using the method of mathematical derivation. It ensured that enhanced speech was continuous and smooth; it removed the auditory oscillation distortion; finally, combined with the Bark wavelet packets, it further improved human auditory perception. Experimental results show that the segmental SNR and PESQ (perceptual evaluation of speech quality) of the enhanced speech using this method increase effectively, compared with the existing speech enhancement algorithms based on wavelet threshold.展开更多
Support vector machine(SVM)has a good application prospect for speech recognition problems;still optimum parameter selection is a vital issue for it.To improve the learning ability of SVM,a method for searching the op...Support vector machine(SVM)has a good application prospect for speech recognition problems;still optimum parameter selection is a vital issue for it.To improve the learning ability of SVM,a method for searching the optimal parameters based on integration of predator prey optimization(PPO)and Hooke-Jeeves method has been proposed.In PPO technique,population consists of prey and predator particles.The prey particles search the optimum solution and predator always attacks the global best prey particle.The solution obtained by PPO is further improved by applying Hooke-Jeeves method.Proposed method is applied to recognize isolated words in a Hindi speech database and also to recognize words in a benchmark database TI-20 in clean and noisy environment.A recognition rate of 81.5%for Hindi database and 92.2%for TI-20 database has been achieved using proposed technique.展开更多
A single-channel speech enhancement method of noisy speech signals at very low signal-to-noise ratios is presented, which is based on masking properties of the human auditory system and power spectral density estimati...A single-channel speech enhancement method of noisy speech signals at very low signal-to-noise ratios is presented, which is based on masking properties of the human auditory system and power spectral density estimation of non stationary noise. It allows for an automatic adaptation in time and frequency of the parametric enhancement system, and finds the best tradeoff among the amount of noise reduction, the speech distortion, and the level of musical residual noise based on a criterion correlated with perception and SNR. This leads to a significant reduction of the unnatural structure of the residual noise. The results with several noise types show that the enhanced speech is more pleasant to a human listener.展开更多
In this work, a novel voice activity detection (VAD) algorithm that uses speech absence probability (SAP) based on Teager energy (TE) was proposed for speech enhancement. The proposed method employs local SAP (...In this work, a novel voice activity detection (VAD) algorithm that uses speech absence probability (SAP) based on Teager energy (TE) was proposed for speech enhancement. The proposed method employs local SAP (LSAP) based on the TE of noisy speech as a feature parameter for voice activity detection (VAD) in each frequency subband, rather than conventional LSAP. Results show that the TE operator can enhance the abiTity to discriminate speech and noise and further suppress noise components. Therefore, TE-based LSAP provides a better representation of LSAP, resulting in improved VAD for estimating noise power in a speech enhancement algorithm. In addition, the presented method utilizes TE-based global SAP (GSAP) derived in each frame as the weighting parameter for modifying the adopted TE operator and improving its performance. The proposed algorithm was evaluated by objective and subjective quality tests under various environments, and was shown to produce better results than the conventional method.展开更多
Based on an auditory model, the zero-crossings with maximal Teager energy operator (ZCMT) feature extraction approach was described, and then applied to speech and emotion recognition. Three kinds of experiments were ...Based on an auditory model, the zero-crossings with maximal Teager energy operator (ZCMT) feature extraction approach was described, and then applied to speech and emotion recognition. Three kinds of experiments were carried out. The first kind consists of isolated word recognition experiments in neutral (non-emotional) speech. The results show that the ZCMT approach effectively improves the recognition accuracy by 3.47% in average compared with the Teager energy operator (TEO). Thus, ZCMT feature can be considered as a noise-robust feature for speech recognition. The second kind consists of mono-lingual emotion recognition experiments by using the Taiyuan University of Technology (TYUT) and the Berlin databases. As the average recognition rate of ZCMT approach is 82.19%, the results indicate that the ZCMT features can characterize speech emotions in an effective way. The third kind consists of cross-lingual experiments with three languages. As the accuracy of ZCMT approach only reduced by 1.45%, the results indicate that the ZCMT features can characterize emotions in a language independent way.展开更多
An improved speech absence probability estimation was proposed using environmental noise classification for speech enhancement.A relevant noise estimation approach,known as the speech presence uncertainty tracking met...An improved speech absence probability estimation was proposed using environmental noise classification for speech enhancement.A relevant noise estimation approach,known as the speech presence uncertainty tracking method,requires seeking the "a priori" probability of speech absence that is derived by applying microphone input signal and the noise signal based on the estimated value of the "a posteriori" signal-to-noise ratio(SNR).To overcome this problem,first,the optimal values in terms of the perceived speech quality of a variety of noise types are derived.Second,the estimated optimal values are assigned according to the determined noise type which is classified by a real-time noise classification algorithm based on the Gaussian mixture model(GMM).The proposed algorithm estimates the speech absence probability using a noise classification algorithm which is based on GMM to apply the optimal parameter of each noise type,unlike the conventional approach which uses a fixed threshold and smoothing parameter.The performance of the proposed method was evaluated by objective tests,such as the perceptual evaluation of speech quality(PESQ) and composite measure.Performance was then evaluated by a subjective test,namely,mean opinion scores(MOS) under various noise environments.The proposed method show better results than existing methods.展开更多
目的 比较韶音后挂式骨导助听器对不同类型听力损失患者的听力干预短期效果,探讨其临床应用前景。方法 55例听力损失患者(年龄18~82岁;传导性听力损失9例,感音神经性听力损失15例,混合性听力损失31例;左右耳0.5、1、2、4 kHz四个频率的...目的 比较韶音后挂式骨导助听器对不同类型听力损失患者的听力干预短期效果,探讨其临床应用前景。方法 55例听力损失患者(年龄18~82岁;传导性听力损失9例,感音神经性听力损失15例,混合性听力损失31例;左右耳0.5、1、2、4 kHz四个频率的骨导纯音听阈均≤60 dB HL)配戴韶音后挂式骨导助听器,分别于配戴助听器前和配戴第14±2 d行声场总体听阈、单音节识别率及安静环境语句识别阈测试,比较配戴助听器前后的结果差异。并于配戴第14±2 d使用IOI-HA问卷对助听器使用效果进行评估。结果 患者配戴后挂式骨导式助听器后声场四个频率平均听阈(39.3±4.9 dB HL)较配戴前(56.5±8.2 dB HL)显著改善,差异有统计学意义(P<0.001)。患者助听前单音节识别率(给声强度:患者助听前双音节言语识别阈减5 dB)为29.8%±11.4%,配戴第14±2 d为72.4%±14.4%,配戴后单音节识别率显著提高,差异有统计学意义(P<0.001)。患者语句识别阈由配戴前的48.6±9.7 dB HL降至34.3±5.6 dB HL,差异有统计学意义(P<0.001)。配戴14±2 d时IOI-HA问卷评估总分平均值为29.0±3.8分。结论 后挂式骨导助听器可显著提高传导性、0.5~4 kHz骨导纯音听阈不超过60 dB HL的混合性及感音神经性听力损失患者的听力及言语识别能力。展开更多
基金Project(61072087) supported by the National Natural Science Foundation of ChinaProject(20093048) supported by Shanxi ProvincialGraduate Innovation Fund of China
文摘Perceptual auditory filter banks such as Bark-scale filter bank are widely used as front-end processing in speech recognition systems.However,the problem of the design of optimized filter banks that provide higher accuracy in recognition tasks is still open.Owing to spectral analysis in feature extraction,an adaptive bands filter bank (ABFB) is presented.The design adopts flexible bandwidths and center frequencies for the frequency responses of the filters and utilizes genetic algorithm (GA) to optimize the design parameters.The optimization process is realized by combining the front-end filter bank with the back-end recognition network in the performance evaluation loop.The deployment of ABFB together with zero-crossing peak amplitude (ZCPA) feature as a front process for radial basis function (RBF) system shows significant improvement in robustness compared with the Bark-scale filter bank.In ABFB,several sub-bands are still more concentrated toward lower frequency but their exact locations are determined by the performance rather than the perceptual criteria.For the ease of optimization,only symmetrical bands are considered here,which still provide satisfactory results.
文摘To enhance the speech quality that is degraded by environmental noise,an algorithm was proposed to reduce the noise and reinforce the speech.The minima controlled recursive averaging(MCRA) algorithm was used to estimate the noise spectrum and the partial masking effect which is one of the psychoacoustic properties was introduced to reinforce speech.The performance evaluation was performed by comparing the PESQ(perceptual evaluation of speech quality) and segSNR(segmental signal to noise ratio) by the proposed algorithm with the conventional algorithm.As a result,average PESQ by the proposed algorithm was higher than the average PESQ by the conventional noise reduction algorithm and segSNR was higher as much as 3.2 dB in average than that of the noise reduction algorithm.
基金Project(60763001)supported by the National Natural Science Foundation of ChinaProjects(2009GZS0027,2010GZS0072)supported by the Natural Science Foundation of Jiangxi Province,China
文摘In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language processing. The speaker independently continuous speech recognition experiments and the part-of-speech tagging experiments show that Markov family model has higher performance than hidden Markov model. The precision is enhanced from 94.642% to 96.214% in the part-of-speech tagging experiments, and the work rate is reduced by 11.9% in the speech recognition experiments with respect to HMM baseline system.
基金Project(61072087) supported by the National Natural Science Foundation of ChinaProject(2011-035) supported by Shanxi Province Scholarship Foundation, China+2 种基金Project(20120010) supported by Universities High-tech Foundation Projects, ChinaProject (2013021016-1) supported by the Youth Science and Technology Foundation of Shanxi Province, ChinaProjects(2013011016-1, 2012011014-1) supported by the Natural Science Foundation of Shanxi Province, China
文摘Enhanced speech based on the traditional wavelet threshold function had auditory oscillation distortion and the low signal-to-noise ratio (SNR). In order to solve these problems, a new continuous differentiable threshold function for speech enhancement was presented. Firstly, the function adopted narrow threshold areas, preserved the smaller signal speech, and improved the speech quality; secondly, based on the properties of the continuous differentiable and non-fixed deviation, each area function was attained gradually by using the method of mathematical derivation. It ensured that enhanced speech was continuous and smooth; it removed the auditory oscillation distortion; finally, combined with the Bark wavelet packets, it further improved human auditory perception. Experimental results show that the segmental SNR and PESQ (perceptual evaluation of speech quality) of the enhanced speech using this method increase effectively, compared with the existing speech enhancement algorithms based on wavelet threshold.
文摘Support vector machine(SVM)has a good application prospect for speech recognition problems;still optimum parameter selection is a vital issue for it.To improve the learning ability of SVM,a method for searching the optimal parameters based on integration of predator prey optimization(PPO)and Hooke-Jeeves method has been proposed.In PPO technique,population consists of prey and predator particles.The prey particles search the optimum solution and predator always attacks the global best prey particle.The solution obtained by PPO is further improved by applying Hooke-Jeeves method.Proposed method is applied to recognize isolated words in a Hindi speech database and also to recognize words in a benchmark database TI-20 in clean and noisy environment.A recognition rate of 81.5%for Hindi database and 92.2%for TI-20 database has been achieved using proposed technique.
文摘A single-channel speech enhancement method of noisy speech signals at very low signal-to-noise ratios is presented, which is based on masking properties of the human auditory system and power spectral density estimation of non stationary noise. It allows for an automatic adaptation in time and frequency of the parametric enhancement system, and finds the best tradeoff among the amount of noise reduction, the speech distortion, and the level of musical residual noise based on a criterion correlated with perception and SNR. This leads to a significant reduction of the unnatural structure of the residual noise. The results with several noise types show that the enhanced speech is more pleasant to a human listener.
基金Supported by National High Technology Research and Development Program of China (863 Program) (2008AA040201), National Natural Science Foundation of China (90920302), National Science and Technology Pillar Program of China (2009BAH41B01), National Natural Science Foundation of China and Research Grants Council of Hong Kong (60931160443) The authors thank Michael T. Johnson in the Depart- ment of Electrical Engineering, Marquette University in USA for the experiments suggestion and helping to improve the English writing.
基金Project supported by Inha University Research GrantProject(10031764) supported by the Strategic Technology Development Program of Ministry of Knowledge Economy, Korea
文摘In this work, a novel voice activity detection (VAD) algorithm that uses speech absence probability (SAP) based on Teager energy (TE) was proposed for speech enhancement. The proposed method employs local SAP (LSAP) based on the TE of noisy speech as a feature parameter for voice activity detection (VAD) in each frequency subband, rather than conventional LSAP. Results show that the TE operator can enhance the abiTity to discriminate speech and noise and further suppress noise components. Therefore, TE-based LSAP provides a better representation of LSAP, resulting in improved VAD for estimating noise power in a speech enhancement algorithm. In addition, the presented method utilizes TE-based global SAP (GSAP) derived in each frame as the weighting parameter for modifying the adopted TE operator and improving its performance. The proposed algorithm was evaluated by objective and subjective quality tests under various environments, and was shown to produce better results than the conventional method.
基金Project(61072087)supported by the National Natural Science Foundation of ChinaProject(2010011020-1)supported by the Natural Scientific Foundation of Shanxi Province,ChinaProject(20093010)supported by Graduate Innovation Fundation of Shanxi Province,China
文摘Based on an auditory model, the zero-crossings with maximal Teager energy operator (ZCMT) feature extraction approach was described, and then applied to speech and emotion recognition. Three kinds of experiments were carried out. The first kind consists of isolated word recognition experiments in neutral (non-emotional) speech. The results show that the ZCMT approach effectively improves the recognition accuracy by 3.47% in average compared with the Teager energy operator (TEO). Thus, ZCMT feature can be considered as a noise-robust feature for speech recognition. The second kind consists of mono-lingual emotion recognition experiments by using the Taiyuan University of Technology (TYUT) and the Berlin databases. As the average recognition rate of ZCMT approach is 82.19%, the results indicate that the ZCMT features can characterize speech emotions in an effective way. The third kind consists of cross-lingual experiments with three languages. As the accuracy of ZCMT approach only reduced by 1.45%, the results indicate that the ZCMT features can characterize emotions in a language independent way.
基金Project supported by an Inha University Research GrantProject(10031764) supported by the Strategic Technology Development Program of Ministry of Knowledge Economy,Korea
文摘An improved speech absence probability estimation was proposed using environmental noise classification for speech enhancement.A relevant noise estimation approach,known as the speech presence uncertainty tracking method,requires seeking the "a priori" probability of speech absence that is derived by applying microphone input signal and the noise signal based on the estimated value of the "a posteriori" signal-to-noise ratio(SNR).To overcome this problem,first,the optimal values in terms of the perceived speech quality of a variety of noise types are derived.Second,the estimated optimal values are assigned according to the determined noise type which is classified by a real-time noise classification algorithm based on the Gaussian mixture model(GMM).The proposed algorithm estimates the speech absence probability using a noise classification algorithm which is based on GMM to apply the optimal parameter of each noise type,unlike the conventional approach which uses a fixed threshold and smoothing parameter.The performance of the proposed method was evaluated by objective tests,such as the perceptual evaluation of speech quality(PESQ) and composite measure.Performance was then evaluated by a subjective test,namely,mean opinion scores(MOS) under various noise environments.The proposed method show better results than existing methods.
文摘目的 比较韶音后挂式骨导助听器对不同类型听力损失患者的听力干预短期效果,探讨其临床应用前景。方法 55例听力损失患者(年龄18~82岁;传导性听力损失9例,感音神经性听力损失15例,混合性听力损失31例;左右耳0.5、1、2、4 kHz四个频率的骨导纯音听阈均≤60 dB HL)配戴韶音后挂式骨导助听器,分别于配戴助听器前和配戴第14±2 d行声场总体听阈、单音节识别率及安静环境语句识别阈测试,比较配戴助听器前后的结果差异。并于配戴第14±2 d使用IOI-HA问卷对助听器使用效果进行评估。结果 患者配戴后挂式骨导式助听器后声场四个频率平均听阈(39.3±4.9 dB HL)较配戴前(56.5±8.2 dB HL)显著改善,差异有统计学意义(P<0.001)。患者助听前单音节识别率(给声强度:患者助听前双音节言语识别阈减5 dB)为29.8%±11.4%,配戴第14±2 d为72.4%±14.4%,配戴后单音节识别率显著提高,差异有统计学意义(P<0.001)。患者语句识别阈由配戴前的48.6±9.7 dB HL降至34.3±5.6 dB HL,差异有统计学意义(P<0.001)。配戴14±2 d时IOI-HA问卷评估总分平均值为29.0±3.8分。结论 后挂式骨导助听器可显著提高传导性、0.5~4 kHz骨导纯音听阈不超过60 dB HL的混合性及感音神经性听力损失患者的听力及言语识别能力。