期刊文献+

说话人分割聚类研究进展 被引量:7

Advances in speaker segmentation and clustering
在线阅读 下载PDF
导出
摘要 说话人分割聚类是近几年新兴起的语音信号处理研究方向,它主要研究如何确定连续语流中多说话人起止时间的位置,并标出每个语音段对应的说话人。这项研究对自动语音识别、多说话人识别和基于内容的音频分析等都具有重要的意义。根据说话人分割和聚类实现过程不同,本文从异步策略和同步策略的角度回顾了十年来国内外研究的主流算法、技术和代表系统,对比了不同代表系统在近几年NIST富信息转写评测的结果,最后讨论了目前还存在的问题,并对未来的发展进行了展望。 Speaker segmentation and clustering, which are focused on the determination of the starting and ending time points in multi-speaker audio flows and labeling the speech signal segments with labels corresponding to the identity speaker, have gradually become a hotspot in the field of speech signal processing in the recent years. It plays an important role in auto- matic speech recognition (ASR), multi-speaker recognition and content-based audio signals analysis. Based on the different implementation processes used in the speaker segmentation and clustering, this paper gives a detailed review of the state-of-the- art algorithms, techniques and typical systems proposed in the past decade from the aspects of asynchronous and synchronous strategies. And the performances of the typical systems are compared through the NIST Rich Transcription (RT) evaluations in recent years. The existing problems are discussed and the future prospects of this research area are also described at the end.
作者 马勇 鲍长春
出处 《信号处理》 CSCD 北大核心 2013年第9期1190-1199,共10页 Journal of Signal Processing
基金 北京市教育委员会科技发展计划重点项目(KZ201110005005) 国家自然科学基金资助项目(61072089)
关键词 说话人分割聚类 异步策略 同步策略 贝叶斯信息判决 Speaker segmentation and clustering Asynchronous strategy Synchronous strategy Bayesian information criterion
作者简介 马勇男,1977年生,江苏新沂,北京工业大学博士研究生,主要研究方向为语音信号处理和模式识别。E—mail:may773@emals.bjut.edu.cn 鲍长春男,1965年生,内蒙古赤峰,博士,北京工业大学教授、博士生导师,IEEE信号处理学会会员,国际语音通信学会(ISCA)会员,中国电子学会理事,中国声学学会理事,信号处理学会委员,《通信学报》编委会副主任委员、《信号处理》和《数据采集与处理》编委。主要研究方向为语音与音频信号处理。E—mail:baochch@bjut.edu.cn
  • 相关文献

参考文献64

二级参考文献60

  • 1张一彬,周杰,边肇祺,张大鹏.一种基于内容的音频流二级分割方法[J].计算机学报,2006,29(3):457-465. 被引量:7
  • 2张世磊,张树武,徐波.一种两层次无监督的音频分割算法[J].中文信息学报,2007,21(2):106-111. 被引量:5
  • 3Reynolds D, Quatieri T, Dunn R. Speaker verification using adapted Gaussian mixture models [J]. Digital Signal Processing, 2000, 10:19 - 41.
  • 4Chen S S, Gales M J. Automatic transcription of broadcast news [J]. Speech Communication, 2002, 37: 69-87.
  • 5Tritschler A, Gopinath R. Improved speaker segmentation and segments clustering using the Bayesian information criterion [C] // Sixth European Conference on Speech Communication and Technology (EUROSPEECH' 99). Budapest, Hungary, 1999:679 - 682.
  • 6Meignier S. E HMM approach for learning and adapting sound models for speaker indexing [C]// Speaker Odyssey. Chania, Crete, 2001: 175-180.
  • 7Gish H, Siu M, Rohlicek R. Segregation of speakers for speech recognition and speaker identification [C] // Proceedings of the ICASSP. Toronto, Canada: IEEE, 1991: 873 - 876.
  • 8Bimbot F, Magrin-Chagnolleau I. Second-order statistical measures for text-independent speaker identification[J]. Speech Communication, 1995, 17: 177-192.
  • 9NIST. The 1998 NIST Speaker Recognition Evaluation Plan [OL]. (1998). URL: http://www. nist. gov/speech/tests/ spk/1998.
  • 10LU L, ZHANG H J. Speaker change detection and tracking in real-time news broadcasting analysis[A]. Proc ACM Multimedia, Juan-les-Pins[C]. France, 2002.602-610.

共引文献36

同被引文献30

  • 1孟国.汉语语速与对外汉语听力教学[J].世界汉语教学,2006,20(2):129-137. 被引量:36
  • 2KOTTI M, MOSCHOU V, KOTROPOULOS C. Speaker segmentation and clustering [ J]. Signal Processing, 2008, 88(5) : 1091-1124.
  • 3KENNY P, BOULIANNE G, OUELLET P, et al. Joint factor analysis versus eigenchannels in speaker recognition [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(4): 1435-1447.
  • 4CASTALDO F, COLIBRO D, DALMASSO E, et al. Stream-based speaker segmentation using speaker factors and eigenvoices [ C ]//2008 IEEE International Conference on Acoustics Speech and Signal Processing. Las Vegas: IEEE Press, 2008: 4133-4136.
  • 5HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks [ J]. Science, 2006, 313(5786): 504-507.
  • 6YU D, DENG L. Deep learning and its applications to signal and information processing [ J ]. IEEE Signal Process Magazine, 2011 , 28( 1 ) : 145-154.
  • 7CHEN K, AHMAD S. Learning speaker-specific characteristics with a deep neural architecture [ J]. IEEE Transactions Neural Networks, 2011 , 22 ( 11 ) : 1744- 1756.
  • 8COATES A, LEE H, NG A Y. An analysis of single layer networks in unsupervised feature learning [ C ] // Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Lauderdale: JMLR W & CP Press, 2011 : 215-223.
  • 9HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving neural networks by preventing co-adaptation of feature detectors [ J ]. Neural and Evolutionary Computing, 2012(7): 1-18.
  • 10VINCENT P, LAROCHELLE H, LAJOIE I, et al. Stacked denoising autoencoders : learning useful representations in a deep network with a local denoising criterion [ J ]. Journal of Machine Learning Research, 2010, 11(12) : 3371-3408.

引证文献7

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部