Perceptual auditory filter banks such as Bark-scale filter bank are widely used as front-end processing in speech recognition systems.However,the problem of the design of optimized filter banks that provide higher acc...Perceptual auditory filter banks such as Bark-scale filter bank are widely used as front-end processing in speech recognition systems.However,the problem of the design of optimized filter banks that provide higher accuracy in recognition tasks is still open.Owing to spectral analysis in feature extraction,an adaptive bands filter bank (ABFB) is presented.The design adopts flexible bandwidths and center frequencies for the frequency responses of the filters and utilizes genetic algorithm (GA) to optimize the design parameters.The optimization process is realized by combining the front-end filter bank with the back-end recognition network in the performance evaluation loop.The deployment of ABFB together with zero-crossing peak amplitude (ZCPA) feature as a front process for radial basis function (RBF) system shows significant improvement in robustness compared with the Bark-scale filter bank.In ABFB,several sub-bands are still more concentrated toward lower frequency but their exact locations are determined by the performance rather than the perceptual criteria.For the ease of optimization,only symmetrical bands are considered here,which still provide satisfactory results.展开更多
In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language proc...In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language processing. The speaker independently continuous speech recognition experiments and the part-of-speech tagging experiments show that Markov family model has higher performance than hidden Markov model. The precision is enhanced from 94.642% to 96.214% in the part-of-speech tagging experiments, and the work rate is reduced by 11.9% in the speech recognition experiments with respect to HMM baseline system.展开更多
Support vector machine(SVM)has a good application prospect for speech recognition problems;still optimum parameter selection is a vital issue for it.To improve the learning ability of SVM,a method for searching the op...Support vector machine(SVM)has a good application prospect for speech recognition problems;still optimum parameter selection is a vital issue for it.To improve the learning ability of SVM,a method for searching the optimal parameters based on integration of predator prey optimization(PPO)and Hooke-Jeeves method has been proposed.In PPO technique,population consists of prey and predator particles.The prey particles search the optimum solution and predator always attacks the global best prey particle.The solution obtained by PPO is further improved by applying Hooke-Jeeves method.Proposed method is applied to recognize isolated words in a Hindi speech database and also to recognize words in a benchmark database TI-20 in clean and noisy environment.A recognition rate of 81.5%for Hindi database and 92.2%for TI-20 database has been achieved using proposed technique.展开更多
Based on an auditory model, the zero-crossings with maximal Teager energy operator (ZCMT) feature extraction approach was described, and then applied to speech and emotion recognition. Three kinds of experiments were ...Based on an auditory model, the zero-crossings with maximal Teager energy operator (ZCMT) feature extraction approach was described, and then applied to speech and emotion recognition. Three kinds of experiments were carried out. The first kind consists of isolated word recognition experiments in neutral (non-emotional) speech. The results show that the ZCMT approach effectively improves the recognition accuracy by 3.47% in average compared with the Teager energy operator (TEO). Thus, ZCMT feature can be considered as a noise-robust feature for speech recognition. The second kind consists of mono-lingual emotion recognition experiments by using the Taiyuan University of Technology (TYUT) and the Berlin databases. As the average recognition rate of ZCMT approach is 82.19%, the results indicate that the ZCMT features can characterize speech emotions in an effective way. The third kind consists of cross-lingual experiments with three languages. As the accuracy of ZCMT approach only reduced by 1.45%, the results indicate that the ZCMT features can characterize emotions in a language independent way.展开更多
目的 比较韶音后挂式骨导助听器对不同类型听力损失患者的听力干预短期效果,探讨其临床应用前景。方法 55例听力损失患者(年龄18~82岁;传导性听力损失9例,感音神经性听力损失15例,混合性听力损失31例;左右耳0.5、1、2、4 kHz四个频率的...目的 比较韶音后挂式骨导助听器对不同类型听力损失患者的听力干预短期效果,探讨其临床应用前景。方法 55例听力损失患者(年龄18~82岁;传导性听力损失9例,感音神经性听力损失15例,混合性听力损失31例;左右耳0.5、1、2、4 kHz四个频率的骨导纯音听阈均≤60 dB HL)配戴韶音后挂式骨导助听器,分别于配戴助听器前和配戴第14±2 d行声场总体听阈、单音节识别率及安静环境语句识别阈测试,比较配戴助听器前后的结果差异。并于配戴第14±2 d使用IOI-HA问卷对助听器使用效果进行评估。结果 患者配戴后挂式骨导式助听器后声场四个频率平均听阈(39.3±4.9 dB HL)较配戴前(56.5±8.2 dB HL)显著改善,差异有统计学意义(P<0.001)。患者助听前单音节识别率(给声强度:患者助听前双音节言语识别阈减5 dB)为29.8%±11.4%,配戴第14±2 d为72.4%±14.4%,配戴后单音节识别率显著提高,差异有统计学意义(P<0.001)。患者语句识别阈由配戴前的48.6±9.7 dB HL降至34.3±5.6 dB HL,差异有统计学意义(P<0.001)。配戴14±2 d时IOI-HA问卷评估总分平均值为29.0±3.8分。结论 后挂式骨导助听器可显著提高传导性、0.5~4 kHz骨导纯音听阈不超过60 dB HL的混合性及感音神经性听力损失患者的听力及言语识别能力。展开更多
基金Project(61072087) supported by the National Natural Science Foundation of ChinaProject(20093048) supported by Shanxi ProvincialGraduate Innovation Fund of China
文摘Perceptual auditory filter banks such as Bark-scale filter bank are widely used as front-end processing in speech recognition systems.However,the problem of the design of optimized filter banks that provide higher accuracy in recognition tasks is still open.Owing to spectral analysis in feature extraction,an adaptive bands filter bank (ABFB) is presented.The design adopts flexible bandwidths and center frequencies for the frequency responses of the filters and utilizes genetic algorithm (GA) to optimize the design parameters.The optimization process is realized by combining the front-end filter bank with the back-end recognition network in the performance evaluation loop.The deployment of ABFB together with zero-crossing peak amplitude (ZCPA) feature as a front process for radial basis function (RBF) system shows significant improvement in robustness compared with the Bark-scale filter bank.In ABFB,several sub-bands are still more concentrated toward lower frequency but their exact locations are determined by the performance rather than the perceptual criteria.For the ease of optimization,only symmetrical bands are considered here,which still provide satisfactory results.
基金Project(60763001)supported by the National Natural Science Foundation of ChinaProjects(2009GZS0027,2010GZS0072)supported by the Natural Science Foundation of Jiangxi Province,China
文摘In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language processing. The speaker independently continuous speech recognition experiments and the part-of-speech tagging experiments show that Markov family model has higher performance than hidden Markov model. The precision is enhanced from 94.642% to 96.214% in the part-of-speech tagging experiments, and the work rate is reduced by 11.9% in the speech recognition experiments with respect to HMM baseline system.
文摘Support vector machine(SVM)has a good application prospect for speech recognition problems;still optimum parameter selection is a vital issue for it.To improve the learning ability of SVM,a method for searching the optimal parameters based on integration of predator prey optimization(PPO)and Hooke-Jeeves method has been proposed.In PPO technique,population consists of prey and predator particles.The prey particles search the optimum solution and predator always attacks the global best prey particle.The solution obtained by PPO is further improved by applying Hooke-Jeeves method.Proposed method is applied to recognize isolated words in a Hindi speech database and also to recognize words in a benchmark database TI-20 in clean and noisy environment.A recognition rate of 81.5%for Hindi database and 92.2%for TI-20 database has been achieved using proposed technique.
基金Supported by National High Technology Research and Development Program of China (863 Program) (2008AA040201), National Natural Science Foundation of China (90920302), National Science and Technology Pillar Program of China (2009BAH41B01), National Natural Science Foundation of China and Research Grants Council of Hong Kong (60931160443) The authors thank Michael T. Johnson in the Depart- ment of Electrical Engineering, Marquette University in USA for the experiments suggestion and helping to improve the English writing.
基金Project(61072087)supported by the National Natural Science Foundation of ChinaProject(2010011020-1)supported by the Natural Scientific Foundation of Shanxi Province,ChinaProject(20093010)supported by Graduate Innovation Fundation of Shanxi Province,China
文摘Based on an auditory model, the zero-crossings with maximal Teager energy operator (ZCMT) feature extraction approach was described, and then applied to speech and emotion recognition. Three kinds of experiments were carried out. The first kind consists of isolated word recognition experiments in neutral (non-emotional) speech. The results show that the ZCMT approach effectively improves the recognition accuracy by 3.47% in average compared with the Teager energy operator (TEO). Thus, ZCMT feature can be considered as a noise-robust feature for speech recognition. The second kind consists of mono-lingual emotion recognition experiments by using the Taiyuan University of Technology (TYUT) and the Berlin databases. As the average recognition rate of ZCMT approach is 82.19%, the results indicate that the ZCMT features can characterize speech emotions in an effective way. The third kind consists of cross-lingual experiments with three languages. As the accuracy of ZCMT approach only reduced by 1.45%, the results indicate that the ZCMT features can characterize emotions in a language independent way.
文摘目的 比较韶音后挂式骨导助听器对不同类型听力损失患者的听力干预短期效果,探讨其临床应用前景。方法 55例听力损失患者(年龄18~82岁;传导性听力损失9例,感音神经性听力损失15例,混合性听力损失31例;左右耳0.5、1、2、4 kHz四个频率的骨导纯音听阈均≤60 dB HL)配戴韶音后挂式骨导助听器,分别于配戴助听器前和配戴第14±2 d行声场总体听阈、单音节识别率及安静环境语句识别阈测试,比较配戴助听器前后的结果差异。并于配戴第14±2 d使用IOI-HA问卷对助听器使用效果进行评估。结果 患者配戴后挂式骨导式助听器后声场四个频率平均听阈(39.3±4.9 dB HL)较配戴前(56.5±8.2 dB HL)显著改善,差异有统计学意义(P<0.001)。患者助听前单音节识别率(给声强度:患者助听前双音节言语识别阈减5 dB)为29.8%±11.4%,配戴第14±2 d为72.4%±14.4%,配戴后单音节识别率显著提高,差异有统计学意义(P<0.001)。患者语句识别阈由配戴前的48.6±9.7 dB HL降至34.3±5.6 dB HL,差异有统计学意义(P<0.001)。配戴14±2 d时IOI-HA问卷评估总分平均值为29.0±3.8分。结论 后挂式骨导助听器可显著提高传导性、0.5~4 kHz骨导纯音听阈不超过60 dB HL的混合性及感音神经性听力损失患者的听力及言语识别能力。