Data processing of small samples is an important and valuable research problem in the electronic equipment test. Because it is difficult and complex to determine the probability distribution of small samples, it is di...Data processing of small samples is an important and valuable research problem in the electronic equipment test. Because it is difficult and complex to determine the probability distribution of small samples, it is difficult to use the traditional probability theory to process the samples and assess the degree of uncertainty. Using the grey relational theory and the norm theory, the grey distance information approach, which is based on the grey distance information quantity of a sample and the average grey distance information quantity of the samples, is proposed in this article. The definitions of the grey distance information quantity of a sample and the average grey distance information quantity of the samples, with their characteristics and algorithms, are introduced. The correlative problems, including the algorithm of estimated value, the standard deviation, and the acceptance and rejection criteria of the samples and estimated results, are also proposed. Moreover, the information whitening ratio is introduced to select the weight algorithm and to compare the different samples. Several examples are given to demonstrate the application of the proposed approach. The examples show that the proposed approach, which has no demand for the probability distribution of small samples, is feasible and effective.展开更多
现有的网络安全态势评估方法没有考虑到工业控制系统(industrial control system,ICS)网络安全需求的特殊性,无法实现准确的评估。此外,ICS传输大量异构数据,容易受到网络攻击,现有的分类方法无法有效处理多类别不平衡数据。针对该问题...现有的网络安全态势评估方法没有考虑到工业控制系统(industrial control system,ICS)网络安全需求的特殊性,无法实现准确的评估。此外,ICS传输大量异构数据,容易受到网络攻击,现有的分类方法无法有效处理多类别不平衡数据。针对该问题,本文首先分析了工控系统的特点,提出了基于层次分析法的工控系统安全态势量化评估方法,该方法可以更准确地反映ICS网络安全状况;然后针对多攻击类型数据不平衡问题,提出了平均欠过采样方法,以平衡数据并且不会导致数据量过大;最后基于极限梯度提升(extreme gradient boosting,XGBoost)算法构建了ICS网络态势评估分类器,实验表明,本文设计的分类模型相较于传统分类算法支持向量机、K近邻以及随机森林可以实现更好的精度。展开更多
针对句子分类任务常面临着训练数据不足,而且文本语言具有离散性,在语义保留的条件下进行数据增强具有一定困难,语义一致性和多样性难以平衡的问题,本文提出一种惩罚生成式预训练语言模型的数据增强方法(punishing generative pre-train...针对句子分类任务常面临着训练数据不足,而且文本语言具有离散性,在语义保留的条件下进行数据增强具有一定困难,语义一致性和多样性难以平衡的问题,本文提出一种惩罚生成式预训练语言模型的数据增强方法(punishing generative pre-trained transformer for data augmentation,PunishGPT-DA)。设计了惩罚项和超参数α,与负对数似然损失函数共同作用微调GPT-2(generative pre-training 2.0),鼓励模型关注那些预测概率较小但仍然合理的输出;使用基于双向编码器表征模型(bidirectional encoder representation from transformers,BERT)的过滤器过滤语义偏差较大的生成样本。本文方法实现了对训练集16倍扩充,与GPT-2相比,在意图识别、问题分类以及情感分析3个任务上的准确率分别提升了1.1%、4.9%和8.7%。实验结果表明,本文提出的方法能够同时有效地控制一致性和多样性需求,提升下游任务模型的训练性能。展开更多
文摘Data processing of small samples is an important and valuable research problem in the electronic equipment test. Because it is difficult and complex to determine the probability distribution of small samples, it is difficult to use the traditional probability theory to process the samples and assess the degree of uncertainty. Using the grey relational theory and the norm theory, the grey distance information approach, which is based on the grey distance information quantity of a sample and the average grey distance information quantity of the samples, is proposed in this article. The definitions of the grey distance information quantity of a sample and the average grey distance information quantity of the samples, with their characteristics and algorithms, are introduced. The correlative problems, including the algorithm of estimated value, the standard deviation, and the acceptance and rejection criteria of the samples and estimated results, are also proposed. Moreover, the information whitening ratio is introduced to select the weight algorithm and to compare the different samples. Several examples are given to demonstrate the application of the proposed approach. The examples show that the proposed approach, which has no demand for the probability distribution of small samples, is feasible and effective.
文摘现有的网络安全态势评估方法没有考虑到工业控制系统(industrial control system,ICS)网络安全需求的特殊性,无法实现准确的评估。此外,ICS传输大量异构数据,容易受到网络攻击,现有的分类方法无法有效处理多类别不平衡数据。针对该问题,本文首先分析了工控系统的特点,提出了基于层次分析法的工控系统安全态势量化评估方法,该方法可以更准确地反映ICS网络安全状况;然后针对多攻击类型数据不平衡问题,提出了平均欠过采样方法,以平衡数据并且不会导致数据量过大;最后基于极限梯度提升(extreme gradient boosting,XGBoost)算法构建了ICS网络态势评估分类器,实验表明,本文设计的分类模型相较于传统分类算法支持向量机、K近邻以及随机森林可以实现更好的精度。
文摘针对句子分类任务常面临着训练数据不足,而且文本语言具有离散性,在语义保留的条件下进行数据增强具有一定困难,语义一致性和多样性难以平衡的问题,本文提出一种惩罚生成式预训练语言模型的数据增强方法(punishing generative pre-trained transformer for data augmentation,PunishGPT-DA)。设计了惩罚项和超参数α,与负对数似然损失函数共同作用微调GPT-2(generative pre-training 2.0),鼓励模型关注那些预测概率较小但仍然合理的输出;使用基于双向编码器表征模型(bidirectional encoder representation from transformers,BERT)的过滤器过滤语义偏差较大的生成样本。本文方法实现了对训练集16倍扩充,与GPT-2相比,在意图识别、问题分类以及情感分析3个任务上的准确率分别提升了1.1%、4.9%和8.7%。实验结果表明,本文提出的方法能够同时有效地控制一致性和多样性需求,提升下游任务模型的训练性能。