身份-矢量(identity-vector,i-vector)方法作为说话人确认领域中的主流方法之一,能够通过学习总变化空间来获取有效的低维说话人特征——i-vector特征.但是当开发集数据不充足时,会导致学习到的总变化空间模型误差较大;同时,还无法有效...身份-矢量(identity-vector,i-vector)方法作为说话人确认领域中的主流方法之一,能够通过学习总变化空间来获取有效的低维说话人特征——i-vector特征.但是当开发集数据不充足时,会导致学习到的总变化空间模型误差较大;同时,还无法有效确认此时的总变化空间是否因为预先设置的维度过高而学到了冗余信息.为此,本文将贝叶斯主成分分析(Bayesian Principal Component Analysis,BPCA)引入总变化空间的学习过程中,利用其来为总变化空间引入更多的先验信息,从而对开发集数据中包含的信息进行补充,并在先验信息的约束下削弱总变化空间中无效维的影响.实验结果表明,当开发集数据不充足时,相比于传统的总变化空间学习方法,BPCA方法能够有效提升说话人确认系统的识别性能.展开更多
A hierarchical particle filter(HPF) framework based on multi-feature fusion is proposed.The proposed HPF effectively uses different feature information to avoid the tracking failure based on the single feature in a ...A hierarchical particle filter(HPF) framework based on multi-feature fusion is proposed.The proposed HPF effectively uses different feature information to avoid the tracking failure based on the single feature in a complicated environment.In this approach,the Harris algorithm is introduced to detect the corner points of the object,and the corner matching algorithm based on singular value decomposition is used to compute the firstorder weights and make particles centralize in the high likelihood area.Then the local binary pattern(LBP) operator is used to build the observation model of the target based on the color and texture features,by which the second-order weights of particles and the accurate location of the target can be obtained.Moreover,a backstepping controller is proposed to complete the whole tracking system.Simulations and experiments are carried out,and the results show that the HPF algorithm with the backstepping controller achieves stable and accurate tracking with good robustness in complex environments.展开更多
In challenging situations,such as low illumination,rain,and background clutter,the stability of the thermal infrared(TIR)spectrum can help red,green,blue(RGB)visible spectrum to improve tracking performance.However,th...In challenging situations,such as low illumination,rain,and background clutter,the stability of the thermal infrared(TIR)spectrum can help red,green,blue(RGB)visible spectrum to improve tracking performance.However,the high-level image information and the modality-specific features have not been sufficiently studied.The proposed correlation filter uses the fused saliency content map to improve filter training and extracts different features of modalities.The fused content map is intro-duced into the spatial regularization term of correlation filter to highlight the training samples in the content region.Furthermore,the fused content map can avoid the incompleteness of the con-tent region caused by challenging situations.Additionally,differ-ent features are extracted according to the modality characteris-tics and are fused by the designed response-level fusion stra-tegy.The alternating direction method of multipliers(ADMM)algorithm is used to solve the tracker training efficiently.Experi-ments on the large-scale benchmark datasets show the effec-tiveness of the proposed tracker compared to the state-of-the-art traditional trackers and the deep learning based trackers.展开更多
In audio stream containing multiple speakers, speaker diarization aids in ascertaining "who speak when". This is an unsupervised task as there is no prior information about the speakers. It labels the speech...In audio stream containing multiple speakers, speaker diarization aids in ascertaining "who speak when". This is an unsupervised task as there is no prior information about the speakers. It labels the speech signal conforming to the identity of the speaker, namely, input audio stream is partitioned into homogeneous segments. In this work, we present a novel speaker diarization system using the Tangent weighted Mel frequency cepstral coefficient(TMFCC) as the feature parameter and Lion algorithm for the clustering of the voice activity detected audio streams into particular speaker groups. Thus the two main tasks of the speaker indexing, i.e., speaker segmentation and speaker clustering, are improved. The TMFCC makes use of the low energy frame as well as the high energy frame with more effect, improving the performance of the proposed system. The experiments using the audio signal from the ELSDSR corpus datasets having three speakers, four speakers and five speakers are analyzed for the proposed system. The evaluation of the proposed speaker diarization system based on the tracking distance, tracking time as the evaluation metrics is done and the experimental results show that the speaker diarization system with the TMFCC parameterization and Lion based clustering is found to be superior over existing diarization systems with 95% tracking accuracy.展开更多
文摘身份-矢量(identity-vector,i-vector)方法作为说话人确认领域中的主流方法之一,能够通过学习总变化空间来获取有效的低维说话人特征——i-vector特征.但是当开发集数据不充足时,会导致学习到的总变化空间模型误差较大;同时,还无法有效确认此时的总变化空间是否因为预先设置的维度过高而学到了冗余信息.为此,本文将贝叶斯主成分分析(Bayesian Principal Component Analysis,BPCA)引入总变化空间的学习过程中,利用其来为总变化空间引入更多的先验信息,从而对开发集数据中包含的信息进行补充,并在先验信息的约束下削弱总变化空间中无效维的影响.实验结果表明,当开发集数据不充足时,相比于传统的总变化空间学习方法,BPCA方法能够有效提升说话人确认系统的识别性能.
基金supported by the National Natural Science Foundation of China(61304097)the Projects of Major International(Regional)Joint Research Program NSFC(61120106010)the Foundation for Innovation Research Groups of the National National Natural Science Foundation of China(61321002)
文摘A hierarchical particle filter(HPF) framework based on multi-feature fusion is proposed.The proposed HPF effectively uses different feature information to avoid the tracking failure based on the single feature in a complicated environment.In this approach,the Harris algorithm is introduced to detect the corner points of the object,and the corner matching algorithm based on singular value decomposition is used to compute the firstorder weights and make particles centralize in the high likelihood area.Then the local binary pattern(LBP) operator is used to build the observation model of the target based on the color and texture features,by which the second-order weights of particles and the accurate location of the target can be obtained.Moreover,a backstepping controller is proposed to complete the whole tracking system.Simulations and experiments are carried out,and the results show that the HPF algorithm with the backstepping controller achieves stable and accurate tracking with good robustness in complex environments.
基金supported by the National Natural Science Foundation of China(62073036,62076031)Beijing Natural Science Foundation(4242049).
文摘In challenging situations,such as low illumination,rain,and background clutter,the stability of the thermal infrared(TIR)spectrum can help red,green,blue(RGB)visible spectrum to improve tracking performance.However,the high-level image information and the modality-specific features have not been sufficiently studied.The proposed correlation filter uses the fused saliency content map to improve filter training and extracts different features of modalities.The fused content map is intro-duced into the spatial regularization term of correlation filter to highlight the training samples in the content region.Furthermore,the fused content map can avoid the incompleteness of the con-tent region caused by challenging situations.Additionally,differ-ent features are extracted according to the modality characteris-tics and are fused by the designed response-level fusion stra-tegy.The alternating direction method of multipliers(ADMM)algorithm is used to solve the tracker training efficiently.Experi-ments on the large-scale benchmark datasets show the effec-tiveness of the proposed tracker compared to the state-of-the-art traditional trackers and the deep learning based trackers.
文摘In audio stream containing multiple speakers, speaker diarization aids in ascertaining "who speak when". This is an unsupervised task as there is no prior information about the speakers. It labels the speech signal conforming to the identity of the speaker, namely, input audio stream is partitioned into homogeneous segments. In this work, we present a novel speaker diarization system using the Tangent weighted Mel frequency cepstral coefficient(TMFCC) as the feature parameter and Lion algorithm for the clustering of the voice activity detected audio streams into particular speaker groups. Thus the two main tasks of the speaker indexing, i.e., speaker segmentation and speaker clustering, are improved. The TMFCC makes use of the low energy frame as well as the high energy frame with more effect, improving the performance of the proposed system. The experiments using the audio signal from the ELSDSR corpus datasets having three speakers, four speakers and five speakers are analyzed for the proposed system. The evaluation of the proposed speaker diarization system based on the tracking distance, tracking time as the evaluation metrics is done and the experimental results show that the speaker diarization system with the TMFCC parameterization and Lion based clustering is found to be superior over existing diarization systems with 95% tracking accuracy.