目的对比speech-ABR在安静及噪声环境下单音节声母、韵母及声调的变化,研究噪声对单音节音位的影响。方法招募正常听力受试者40例(男20例,女20例),母语为汉语普通话。Speech-ABR刺激声为260ms时程的合成言语声/mi/,声调为三声,刺激强度...目的对比speech-ABR在安静及噪声环境下单音节声母、韵母及声调的变化,研究噪声对单音节音位的影响。方法招募正常听力受试者40例(男20例,女20例),母语为汉语普通话。Speech-ABR刺激声为260ms时程的合成言语声/mi/,声调为三声,刺激强度为70d B SPL,记录右耳安静状态下及噪声状态下(信噪比SNR=-10d B)speech-ABR的反应波形。对比起始反应波形(onset response,OR)、过渡反应波形(consonant-to-vowel transition)及频率跟随反应波形(frequency following response,FFR)的潜伏期的变化。并对比安静及噪声状态下声调追踪(pitch tracking)相关系数r的变化。使用SPSS18.0软件进行数据统计分析,数据采用配对t检验分析两组的差异,P<0.05时为差异有统计学意义。结果260ms时程/mi/诱发的言语听性脑干反应波形特征,主要由潜伏期为10ms内的起始反应、潜伏期为80-220ms内的频率跟随反应及最后的终止反应组成,以及潜伏期在10-80ms内的辅音-元音过渡反应。其中起始反应部分为辅音部分所诱发;过渡反应部分为辅-元音的过渡信息诱发;由/mi/中的元音部分所诱发的频率跟随反应部分共由15个波形组成。经配对t检验分析,在安静及噪声环境下进行对比,起始反应峰值(辅音部分)平均潜伏期延长0.85±0.17ms(P=0.000)。过度反应峰值平均潜伏期延长0.75±0.15ms((P=0.000)。频率跟随反应峰值平均潜伏期延长0.38±0.10ms(P=0.000),结果均具有统计学意义。安静环境下声调追踪反应相关系数r均值为0.84±0.08,噪声环境下相关系数r均值为0.74±0.12,两者对比结果具有统计学意义((P=0.000)。结论在噪声环境下,测试音的辅音、元音对应波形潜伏期均发生变化,声调追踪系数会有所下降,提示三种音位均会受到噪声的影响。与以往主观的言语识别率测试方式及诱发电位测试相比,speech-ABR是一种客观方式评估言语声受到噪声干扰情况的测试方法。展开更多
Based on an auditory model, the zero-crossings with maximal Teager energy operator (ZCMT) feature extraction approach was described, and then applied to speech and emotion recognition. Three kinds of experiments were ...Based on an auditory model, the zero-crossings with maximal Teager energy operator (ZCMT) feature extraction approach was described, and then applied to speech and emotion recognition. Three kinds of experiments were carried out. The first kind consists of isolated word recognition experiments in neutral (non-emotional) speech. The results show that the ZCMT approach effectively improves the recognition accuracy by 3.47% in average compared with the Teager energy operator (TEO). Thus, ZCMT feature can be considered as a noise-robust feature for speech recognition. The second kind consists of mono-lingual emotion recognition experiments by using the Taiyuan University of Technology (TYUT) and the Berlin databases. As the average recognition rate of ZCMT approach is 82.19%, the results indicate that the ZCMT features can characterize speech emotions in an effective way. The third kind consists of cross-lingual experiments with three languages. As the accuracy of ZCMT approach only reduced by 1.45%, the results indicate that the ZCMT features can characterize emotions in a language independent way.展开更多
In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language proc...In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language processing. The speaker independently continuous speech recognition experiments and the part-of-speech tagging experiments show that Markov family model has higher performance than hidden Markov model. The precision is enhanced from 94.642% to 96.214% in the part-of-speech tagging experiments, and the work rate is reduced by 11.9% in the speech recognition experiments with respect to HMM baseline system.展开更多
In this work, a novel voice activity detection (VAD) algorithm that uses speech absence probability (SAP) based on Teager energy (TE) was proposed for speech enhancement. The proposed method employs local SAP (...In this work, a novel voice activity detection (VAD) algorithm that uses speech absence probability (SAP) based on Teager energy (TE) was proposed for speech enhancement. The proposed method employs local SAP (LSAP) based on the TE of noisy speech as a feature parameter for voice activity detection (VAD) in each frequency subband, rather than conventional LSAP. Results show that the TE operator can enhance the abiTity to discriminate speech and noise and further suppress noise components. Therefore, TE-based LSAP provides a better representation of LSAP, resulting in improved VAD for estimating noise power in a speech enhancement algorithm. In addition, the presented method utilizes TE-based global SAP (GSAP) derived in each frame as the weighting parameter for modifying the adopted TE operator and improving its performance. The proposed algorithm was evaluated by objective and subjective quality tests under various environments, and was shown to produce better results than the conventional method.展开更多
An improved speech absence probability estimation was proposed using environmental noise classification for speech enhancement.A relevant noise estimation approach,known as the speech presence uncertainty tracking met...An improved speech absence probability estimation was proposed using environmental noise classification for speech enhancement.A relevant noise estimation approach,known as the speech presence uncertainty tracking method,requires seeking the "a priori" probability of speech absence that is derived by applying microphone input signal and the noise signal based on the estimated value of the "a posteriori" signal-to-noise ratio(SNR).To overcome this problem,first,the optimal values in terms of the perceived speech quality of a variety of noise types are derived.Second,the estimated optimal values are assigned according to the determined noise type which is classified by a real-time noise classification algorithm based on the Gaussian mixture model(GMM).The proposed algorithm estimates the speech absence probability using a noise classification algorithm which is based on GMM to apply the optimal parameter of each noise type,unlike the conventional approach which uses a fixed threshold and smoothing parameter.The performance of the proposed method was evaluated by objective tests,such as the perceptual evaluation of speech quality(PESQ) and composite measure.Performance was then evaluated by a subjective test,namely,mean opinion scores(MOS) under various noise environments.The proposed method show better results than existing methods.展开更多
The article reviews child-directed speech and foreigner talk respectively and comparatively. It compares the features, functions and some of the similarities as well as differences of the two registers. They should be...The article reviews child-directed speech and foreigner talk respectively and comparatively. It compares the features, functions and some of the similarities as well as differences of the two registers. They should be thought of as dynamic, changing in accordance with various situational factors rather than static, fixed sets of features.展开更多
This paper describes male-female differences in speech behavior from the following aspects: their different attitudes towards public speaking and private speaking; their different attitudes towards public details and ...This paper describes male-female differences in speech behavior from the following aspects: their different attitudes towards public speaking and private speaking; their different attitudes towards public details and private details; their different purposes towards troubles; their different attitudes towards asking information. Then this paper presents explanations for male-female differences in speech behavior from social point of view and anthropological point of view.展开更多
Perceptual auditory filter banks such as Bark-scale filter bank are widely used as front-end processing in speech recognition systems.However,the problem of the design of optimized filter banks that provide higher acc...Perceptual auditory filter banks such as Bark-scale filter bank are widely used as front-end processing in speech recognition systems.However,the problem of the design of optimized filter banks that provide higher accuracy in recognition tasks is still open.Owing to spectral analysis in feature extraction,an adaptive bands filter bank (ABFB) is presented.The design adopts flexible bandwidths and center frequencies for the frequency responses of the filters and utilizes genetic algorithm (GA) to optimize the design parameters.The optimization process is realized by combining the front-end filter bank with the back-end recognition network in the performance evaluation loop.The deployment of ABFB together with zero-crossing peak amplitude (ZCPA) feature as a front process for radial basis function (RBF) system shows significant improvement in robustness compared with the Bark-scale filter bank.In ABFB,several sub-bands are still more concentrated toward lower frequency but their exact locations are determined by the performance rather than the perceptual criteria.For the ease of optimization,only symmetrical bands are considered here,which still provide satisfactory results.展开更多
To enhance the speech quality that is degraded by environmental noise,an algorithm was proposed to reduce the noise and reinforce the speech.The minima controlled recursive averaging(MCRA) algorithm was used to estima...To enhance the speech quality that is degraded by environmental noise,an algorithm was proposed to reduce the noise and reinforce the speech.The minima controlled recursive averaging(MCRA) algorithm was used to estimate the noise spectrum and the partial masking effect which is one of the psychoacoustic properties was introduced to reinforce speech.The performance evaluation was performed by comparing the PESQ(perceptual evaluation of speech quality) and segSNR(segmental signal to noise ratio) by the proposed algorithm with the conventional algorithm.As a result,average PESQ by the proposed algorithm was higher than the average PESQ by the conventional noise reduction algorithm and segSNR was higher as much as 3.2 dB in average than that of the noise reduction algorithm.展开更多
Enhanced speech based on the traditional wavelet threshold function had auditory oscillation distortion and the low signal-to-noise ratio (SNR). In order to solve these problems, a new continuous differentiable thresh...Enhanced speech based on the traditional wavelet threshold function had auditory oscillation distortion and the low signal-to-noise ratio (SNR). In order to solve these problems, a new continuous differentiable threshold function for speech enhancement was presented. Firstly, the function adopted narrow threshold areas, preserved the smaller signal speech, and improved the speech quality; secondly, based on the properties of the continuous differentiable and non-fixed deviation, each area function was attained gradually by using the method of mathematical derivation. It ensured that enhanced speech was continuous and smooth; it removed the auditory oscillation distortion; finally, combined with the Bark wavelet packets, it further improved human auditory perception. Experimental results show that the segmental SNR and PESQ (perceptual evaluation of speech quality) of the enhanced speech using this method increase effectively, compared with the existing speech enhancement algorithms based on wavelet threshold.展开更多
Objective speech quality is difficult to be measured without the input reference speech.Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm.T...Objective speech quality is difficult to be measured without the input reference speech.Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm.The degraded speech is firstly separated into three classes(unvoiced,voiced and silence),and then the consistency measurement between the degraded speech signal and the pre-trained reference model for each class is calculated and mapped to an objective speech quality score using data mining.Fuzzy Gaussian mixture model(GMM)is used to generate the artificial reference model trained on perceptual linear predictive(PLP)features.The mean opinion score(MOS)mapping methods including multivariate non-linear regression(MNLR),fuzzy neural network(FNN)and support vector regression(SVR)are designed and compared with the standard ITU-T P.563 method.Experimental results show that the assessment methods with data mining perform better than ITU-T P.563.Moreover,FNN and SVR are more efficient than MNLR,and FNN performs best with 14.50% increase in the correlation coefficient and 32.76% decrease in the root-mean-square MOS error.展开更多
A novel approach is proposed for improving adaptive feedback cancellation using a variable step-size affine projection algorithm(VSS-APA) based on global speech absence probability(GSAP).The variable step-size of the ...A novel approach is proposed for improving adaptive feedback cancellation using a variable step-size affine projection algorithm(VSS-APA) based on global speech absence probability(GSAP).The variable step-size of the proposed VSS-APA is adjusted according to the GSAP of the current frame.The weight vector of the adaptive filter is updated by the probability of the speech absence.The performance measure of acoustic feedback cancellation is evaluated using normalized misalignment.Experimental results demonstrate that the proposed approach has better performance than the normalized least mean square(NLMS) and the constant step-size affine projection algorithms.展开更多
Support vector machine(SVM)has a good application prospect for speech recognition problems;still optimum parameter selection is a vital issue for it.To improve the learning ability of SVM,a method for searching the op...Support vector machine(SVM)has a good application prospect for speech recognition problems;still optimum parameter selection is a vital issue for it.To improve the learning ability of SVM,a method for searching the optimal parameters based on integration of predator prey optimization(PPO)and Hooke-Jeeves method has been proposed.In PPO technique,population consists of prey and predator particles.The prey particles search the optimum solution and predator always attacks the global best prey particle.The solution obtained by PPO is further improved by applying Hooke-Jeeves method.Proposed method is applied to recognize isolated words in a Hindi speech database and also to recognize words in a benchmark database TI-20 in clean and noisy environment.A recognition rate of 81.5%for Hindi database and 92.2%for TI-20 database has been achieved using proposed technique.展开更多
English speech is a discourse delivered at an assembly or on formal occasions. As a variety of the English language, English speech has a unique presentation of its own. This paper, as its title indicates, is to analy...English speech is a discourse delivered at an assembly or on formal occasions. As a variety of the English language, English speech has a unique presentation of its own. This paper, as its title indicates, is to analyze and probe the linguistic and rhetorical features of famous English speeches with a view to improving the ability to appreciate English speeches on the part of Chinese learners of English.展开更多
A single-channel speech enhancement method of noisy speech signals at very low signal-to-noise ratios is presented, which is based on masking properties of the human auditory system and power spectral density estimati...A single-channel speech enhancement method of noisy speech signals at very low signal-to-noise ratios is presented, which is based on masking properties of the human auditory system and power spectral density estimation of non stationary noise. It allows for an automatic adaptation in time and frequency of the parametric enhancement system, and finds the best tradeoff among the amount of noise reduction, the speech distortion, and the level of musical residual noise based on a criterion correlated with perception and SNR. This leads to a significant reduction of the unnatural structure of the residual noise. The results with several noise types show that the enhanced speech is more pleasant to a human listener.展开更多
文摘目的对比speech-ABR在安静及噪声环境下单音节声母、韵母及声调的变化,研究噪声对单音节音位的影响。方法招募正常听力受试者40例(男20例,女20例),母语为汉语普通话。Speech-ABR刺激声为260ms时程的合成言语声/mi/,声调为三声,刺激强度为70d B SPL,记录右耳安静状态下及噪声状态下(信噪比SNR=-10d B)speech-ABR的反应波形。对比起始反应波形(onset response,OR)、过渡反应波形(consonant-to-vowel transition)及频率跟随反应波形(frequency following response,FFR)的潜伏期的变化。并对比安静及噪声状态下声调追踪(pitch tracking)相关系数r的变化。使用SPSS18.0软件进行数据统计分析,数据采用配对t检验分析两组的差异,P<0.05时为差异有统计学意义。结果260ms时程/mi/诱发的言语听性脑干反应波形特征,主要由潜伏期为10ms内的起始反应、潜伏期为80-220ms内的频率跟随反应及最后的终止反应组成,以及潜伏期在10-80ms内的辅音-元音过渡反应。其中起始反应部分为辅音部分所诱发;过渡反应部分为辅-元音的过渡信息诱发;由/mi/中的元音部分所诱发的频率跟随反应部分共由15个波形组成。经配对t检验分析,在安静及噪声环境下进行对比,起始反应峰值(辅音部分)平均潜伏期延长0.85±0.17ms(P=0.000)。过度反应峰值平均潜伏期延长0.75±0.15ms((P=0.000)。频率跟随反应峰值平均潜伏期延长0.38±0.10ms(P=0.000),结果均具有统计学意义。安静环境下声调追踪反应相关系数r均值为0.84±0.08,噪声环境下相关系数r均值为0.74±0.12,两者对比结果具有统计学意义((P=0.000)。结论在噪声环境下,测试音的辅音、元音对应波形潜伏期均发生变化,声调追踪系数会有所下降,提示三种音位均会受到噪声的影响。与以往主观的言语识别率测试方式及诱发电位测试相比,speech-ABR是一种客观方式评估言语声受到噪声干扰情况的测试方法。
基金Project(61072087)supported by the National Natural Science Foundation of ChinaProject(2010011020-1)supported by the Natural Scientific Foundation of Shanxi Province,ChinaProject(20093010)supported by Graduate Innovation Fundation of Shanxi Province,China
文摘Based on an auditory model, the zero-crossings with maximal Teager energy operator (ZCMT) feature extraction approach was described, and then applied to speech and emotion recognition. Three kinds of experiments were carried out. The first kind consists of isolated word recognition experiments in neutral (non-emotional) speech. The results show that the ZCMT approach effectively improves the recognition accuracy by 3.47% in average compared with the Teager energy operator (TEO). Thus, ZCMT feature can be considered as a noise-robust feature for speech recognition. The second kind consists of mono-lingual emotion recognition experiments by using the Taiyuan University of Technology (TYUT) and the Berlin databases. As the average recognition rate of ZCMT approach is 82.19%, the results indicate that the ZCMT features can characterize speech emotions in an effective way. The third kind consists of cross-lingual experiments with three languages. As the accuracy of ZCMT approach only reduced by 1.45%, the results indicate that the ZCMT features can characterize emotions in a language independent way.
基金Project(60763001)supported by the National Natural Science Foundation of ChinaProjects(2009GZS0027,2010GZS0072)supported by the Natural Science Foundation of Jiangxi Province,China
文摘In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language processing. The speaker independently continuous speech recognition experiments and the part-of-speech tagging experiments show that Markov family model has higher performance than hidden Markov model. The precision is enhanced from 94.642% to 96.214% in the part-of-speech tagging experiments, and the work rate is reduced by 11.9% in the speech recognition experiments with respect to HMM baseline system.
基金Project supported by Inha University Research GrantProject(10031764) supported by the Strategic Technology Development Program of Ministry of Knowledge Economy, Korea
文摘In this work, a novel voice activity detection (VAD) algorithm that uses speech absence probability (SAP) based on Teager energy (TE) was proposed for speech enhancement. The proposed method employs local SAP (LSAP) based on the TE of noisy speech as a feature parameter for voice activity detection (VAD) in each frequency subband, rather than conventional LSAP. Results show that the TE operator can enhance the abiTity to discriminate speech and noise and further suppress noise components. Therefore, TE-based LSAP provides a better representation of LSAP, resulting in improved VAD for estimating noise power in a speech enhancement algorithm. In addition, the presented method utilizes TE-based global SAP (GSAP) derived in each frame as the weighting parameter for modifying the adopted TE operator and improving its performance. The proposed algorithm was evaluated by objective and subjective quality tests under various environments, and was shown to produce better results than the conventional method.
基金Project supported by an Inha University Research GrantProject(10031764) supported by the Strategic Technology Development Program of Ministry of Knowledge Economy,Korea
文摘An improved speech absence probability estimation was proposed using environmental noise classification for speech enhancement.A relevant noise estimation approach,known as the speech presence uncertainty tracking method,requires seeking the "a priori" probability of speech absence that is derived by applying microphone input signal and the noise signal based on the estimated value of the "a posteriori" signal-to-noise ratio(SNR).To overcome this problem,first,the optimal values in terms of the perceived speech quality of a variety of noise types are derived.Second,the estimated optimal values are assigned according to the determined noise type which is classified by a real-time noise classification algorithm based on the Gaussian mixture model(GMM).The proposed algorithm estimates the speech absence probability using a noise classification algorithm which is based on GMM to apply the optimal parameter of each noise type,unlike the conventional approach which uses a fixed threshold and smoothing parameter.The performance of the proposed method was evaluated by objective tests,such as the perceptual evaluation of speech quality(PESQ) and composite measure.Performance was then evaluated by a subjective test,namely,mean opinion scores(MOS) under various noise environments.The proposed method show better results than existing methods.
文摘The article reviews child-directed speech and foreigner talk respectively and comparatively. It compares the features, functions and some of the similarities as well as differences of the two registers. They should be thought of as dynamic, changing in accordance with various situational factors rather than static, fixed sets of features.
文摘This paper describes male-female differences in speech behavior from the following aspects: their different attitudes towards public speaking and private speaking; their different attitudes towards public details and private details; their different purposes towards troubles; their different attitudes towards asking information. Then this paper presents explanations for male-female differences in speech behavior from social point of view and anthropological point of view.
基金Project(61072087) supported by the National Natural Science Foundation of ChinaProject(20093048) supported by Shanxi ProvincialGraduate Innovation Fund of China
文摘Perceptual auditory filter banks such as Bark-scale filter bank are widely used as front-end processing in speech recognition systems.However,the problem of the design of optimized filter banks that provide higher accuracy in recognition tasks is still open.Owing to spectral analysis in feature extraction,an adaptive bands filter bank (ABFB) is presented.The design adopts flexible bandwidths and center frequencies for the frequency responses of the filters and utilizes genetic algorithm (GA) to optimize the design parameters.The optimization process is realized by combining the front-end filter bank with the back-end recognition network in the performance evaluation loop.The deployment of ABFB together with zero-crossing peak amplitude (ZCPA) feature as a front process for radial basis function (RBF) system shows significant improvement in robustness compared with the Bark-scale filter bank.In ABFB,several sub-bands are still more concentrated toward lower frequency but their exact locations are determined by the performance rather than the perceptual criteria.For the ease of optimization,only symmetrical bands are considered here,which still provide satisfactory results.
文摘To enhance the speech quality that is degraded by environmental noise,an algorithm was proposed to reduce the noise and reinforce the speech.The minima controlled recursive averaging(MCRA) algorithm was used to estimate the noise spectrum and the partial masking effect which is one of the psychoacoustic properties was introduced to reinforce speech.The performance evaluation was performed by comparing the PESQ(perceptual evaluation of speech quality) and segSNR(segmental signal to noise ratio) by the proposed algorithm with the conventional algorithm.As a result,average PESQ by the proposed algorithm was higher than the average PESQ by the conventional noise reduction algorithm and segSNR was higher as much as 3.2 dB in average than that of the noise reduction algorithm.
基金Project(61072087) supported by the National Natural Science Foundation of ChinaProject(2011-035) supported by Shanxi Province Scholarship Foundation, China+2 种基金Project(20120010) supported by Universities High-tech Foundation Projects, ChinaProject (2013021016-1) supported by the Youth Science and Technology Foundation of Shanxi Province, ChinaProjects(2013011016-1, 2012011014-1) supported by the Natural Science Foundation of Shanxi Province, China
文摘Enhanced speech based on the traditional wavelet threshold function had auditory oscillation distortion and the low signal-to-noise ratio (SNR). In order to solve these problems, a new continuous differentiable threshold function for speech enhancement was presented. Firstly, the function adopted narrow threshold areas, preserved the smaller signal speech, and improved the speech quality; secondly, based on the properties of the continuous differentiable and non-fixed deviation, each area function was attained gradually by using the method of mathematical derivation. It ensured that enhanced speech was continuous and smooth; it removed the auditory oscillation distortion; finally, combined with the Bark wavelet packets, it further improved human auditory perception. Experimental results show that the segmental SNR and PESQ (perceptual evaluation of speech quality) of the enhanced speech using this method increase effectively, compared with the existing speech enhancement algorithms based on wavelet threshold.
基金Projects(61001188,1161140319)supported by the National Natural Science Foundation of ChinaProject(2012ZX03001034)supported by the National Science and Technology Major ProjectProject(YETP1202)supported by Beijing Higher Education Young Elite Teacher Project,China
文摘Objective speech quality is difficult to be measured without the input reference speech.Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm.The degraded speech is firstly separated into three classes(unvoiced,voiced and silence),and then the consistency measurement between the degraded speech signal and the pre-trained reference model for each class is calculated and mapped to an objective speech quality score using data mining.Fuzzy Gaussian mixture model(GMM)is used to generate the artificial reference model trained on perceptual linear predictive(PLP)features.The mean opinion score(MOS)mapping methods including multivariate non-linear regression(MNLR),fuzzy neural network(FNN)and support vector regression(SVR)are designed and compared with the standard ITU-T P.563 method.Experimental results show that the assessment methods with data mining perform better than ITU-T P.563.Moreover,FNN and SVR are more efficient than MNLR,and FNN performs best with 14.50% increase in the correlation coefficient and 32.76% decrease in the root-mean-square MOS error.
基金Project(2010-0020163)supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education
文摘A novel approach is proposed for improving adaptive feedback cancellation using a variable step-size affine projection algorithm(VSS-APA) based on global speech absence probability(GSAP).The variable step-size of the proposed VSS-APA is adjusted according to the GSAP of the current frame.The weight vector of the adaptive filter is updated by the probability of the speech absence.The performance measure of acoustic feedback cancellation is evaluated using normalized misalignment.Experimental results demonstrate that the proposed approach has better performance than the normalized least mean square(NLMS) and the constant step-size affine projection algorithms.
文摘Support vector machine(SVM)has a good application prospect for speech recognition problems;still optimum parameter selection is a vital issue for it.To improve the learning ability of SVM,a method for searching the optimal parameters based on integration of predator prey optimization(PPO)and Hooke-Jeeves method has been proposed.In PPO technique,population consists of prey and predator particles.The prey particles search the optimum solution and predator always attacks the global best prey particle.The solution obtained by PPO is further improved by applying Hooke-Jeeves method.Proposed method is applied to recognize isolated words in a Hindi speech database and also to recognize words in a benchmark database TI-20 in clean and noisy environment.A recognition rate of 81.5%for Hindi database and 92.2%for TI-20 database has been achieved using proposed technique.
文摘English speech is a discourse delivered at an assembly or on formal occasions. As a variety of the English language, English speech has a unique presentation of its own. This paper, as its title indicates, is to analyze and probe the linguistic and rhetorical features of famous English speeches with a view to improving the ability to appreciate English speeches on the part of Chinese learners of English.
文摘A single-channel speech enhancement method of noisy speech signals at very low signal-to-noise ratios is presented, which is based on masking properties of the human auditory system and power spectral density estimation of non stationary noise. It allows for an automatic adaptation in time and frequency of the parametric enhancement system, and finds the best tradeoff among the amount of noise reduction, the speech distortion, and the level of musical residual noise based on a criterion correlated with perception and SNR. This leads to a significant reduction of the unnatural structure of the residual noise. The results with several noise types show that the enhanced speech is more pleasant to a human listener.