Rare labeled data are difficult to recognize by using conventional methods in the process of radar emitter recogni-tion.To solve this problem,an optimized cooperative semi-supervised learning radar emitter recognition...Rare labeled data are difficult to recognize by using conventional methods in the process of radar emitter recogni-tion.To solve this problem,an optimized cooperative semi-supervised learning radar emitter recognition method based on a small amount of labeled data is developed.First,a small amount of labeled data are randomly sampled by using the bootstrap method,loss functions for three common deep learning net-works are improved,the uniform distribution and cross-entropy function are combined to reduce the overconfidence of softmax classification.Subsequently,the dataset obtained after sam-pling is adopted to train three improved networks so as to build the initial model.In addition,the unlabeled data are preliminarily screened through dynamic time warping(DTW)and then input into the initial model trained previously for judgment.If the judg-ment results of two or more networks are consistent,the unla-beled data are labeled and put into the labeled data set.Lastly,the three network models are input into the labeled dataset for training,and the final model is built.As revealed by the simula-tion results,the semi-supervised learning method adopted in this paper is capable of exploiting a small amount of labeled data and basically achieving the accuracy of labeled data recognition.展开更多
在话题检测任务中,面对微博这类短文本时,针对SBERT模型的特征提取能力的局限性,以及在聚类阶段,单遍聚类算法存在的小簇问题和效率问题,对两者改进,提出一种基于半监督SBERT与SinglePass(semi-supervised SBERT with SinglePass cluste...在话题检测任务中,面对微博这类短文本时,针对SBERT模型的特征提取能力的局限性,以及在聚类阶段,单遍聚类算法存在的小簇问题和效率问题,对两者改进,提出一种基于半监督SBERT与SinglePass(semi-supervised SBERT with SinglePass clustering,Semi-SBERT-SP)的微博热点话题检测方法,将SBERT模型结合半监督训练,提高其短文本特征提取能力。在聚类阶段过程中引入时间窗口和降维,提高算法效率,增加一个合并层,处理算法产生的小簇。在话题表示层,提出一种融入词热度的词贡献指标,用于提取话题簇中的关键词。实验结果表明,该方法在准确率、F1、互信息3个指标上均优于对比模型或方法,能够有效检测出微博中包含的热点话题。展开更多
Tri-training利用无标签数据进行分类可有效提高分类器的泛化能力,但其易将无标签数据误标,从而形成训练噪声。提出一种基于密度峰值聚类的Tri-training(Tri-training with density peaks clustering,DPC-TT)算法。密度峰值聚类通过类...Tri-training利用无标签数据进行分类可有效提高分类器的泛化能力,但其易将无标签数据误标,从而形成训练噪声。提出一种基于密度峰值聚类的Tri-training(Tri-training with density peaks clustering,DPC-TT)算法。密度峰值聚类通过类簇中心和局部密度可选出数据空间结构表现较好的样本。DPC-TT算法采用密度峰值聚类算法获取训练数据的类簇中心和样本的局部密度,对类簇中心的截断距离范围内的样本认定为空间结构表现较好,标记为核心数据,使用核心数据更新分类器,可降低迭代过程中的训练噪声,进而提高分类器的性能。实验结果表明:相比于标准Tritraining算法及其改进算法,DPC-TT算法具有更好的分类性能。展开更多
文摘Rare labeled data are difficult to recognize by using conventional methods in the process of radar emitter recogni-tion.To solve this problem,an optimized cooperative semi-supervised learning radar emitter recognition method based on a small amount of labeled data is developed.First,a small amount of labeled data are randomly sampled by using the bootstrap method,loss functions for three common deep learning net-works are improved,the uniform distribution and cross-entropy function are combined to reduce the overconfidence of softmax classification.Subsequently,the dataset obtained after sam-pling is adopted to train three improved networks so as to build the initial model.In addition,the unlabeled data are preliminarily screened through dynamic time warping(DTW)and then input into the initial model trained previously for judgment.If the judg-ment results of two or more networks are consistent,the unla-beled data are labeled and put into the labeled data set.Lastly,the three network models are input into the labeled dataset for training,and the final model is built.As revealed by the simula-tion results,the semi-supervised learning method adopted in this paper is capable of exploiting a small amount of labeled data and basically achieving the accuracy of labeled data recognition.
文摘在话题检测任务中,面对微博这类短文本时,针对SBERT模型的特征提取能力的局限性,以及在聚类阶段,单遍聚类算法存在的小簇问题和效率问题,对两者改进,提出一种基于半监督SBERT与SinglePass(semi-supervised SBERT with SinglePass clustering,Semi-SBERT-SP)的微博热点话题检测方法,将SBERT模型结合半监督训练,提高其短文本特征提取能力。在聚类阶段过程中引入时间窗口和降维,提高算法效率,增加一个合并层,处理算法产生的小簇。在话题表示层,提出一种融入词热度的词贡献指标,用于提取话题簇中的关键词。实验结果表明,该方法在准确率、F1、互信息3个指标上均优于对比模型或方法,能够有效检测出微博中包含的热点话题。
文摘Tri-training利用无标签数据进行分类可有效提高分类器的泛化能力,但其易将无标签数据误标,从而形成训练噪声。提出一种基于密度峰值聚类的Tri-training(Tri-training with density peaks clustering,DPC-TT)算法。密度峰值聚类通过类簇中心和局部密度可选出数据空间结构表现较好的样本。DPC-TT算法采用密度峰值聚类算法获取训练数据的类簇中心和样本的局部密度,对类簇中心的截断距离范围内的样本认定为空间结构表现较好,标记为核心数据,使用核心数据更新分类器,可降低迭代过程中的训练噪声,进而提高分类器的性能。实验结果表明:相比于标准Tritraining算法及其改进算法,DPC-TT算法具有更好的分类性能。