A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed.The proposed method clusters the output data into groups and clusters the input d...A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed.The proposed method clusters the output data into groups and clusters the input data in accordance with the groups of output data.Then,a set of prototypes are selected from the clustered input data.The inessential data can be ultimately discarded from the data set.The proposed method can reduce the effect from outliers because only the prototypes are used.This method is applied to reduce the data set in regression problems.Two standard synthetic data sets and three standard real-world data sets are used for evaluation.The root-mean-square errors are compared from support vector regression models trained with the original data sets and the corresponding instance-reduced data sets.From the experiments,the proposed method provides good results on the reduction and the reconstruction of the standard synthetic and real-world data sets.The numbers of instances of the synthetic data sets are decreased by 25%-69%.The reduction rates for the real-world data sets of the automobile miles per gallon and the 1990 census in CA are 46% and 57%,respectively.The reduction rate of 96% is very good for the electrocardiogram(ECG) data set because of the redundant and periodic nature of ECG signals.For all of the data sets,the regression results are similar to those from the corresponding original data sets.Therefore,the regression performance of the proposed method is good while only a fraction of the data is needed in the training process.展开更多
A new fuzzy support vector machine algorithm with dual membership values based on spectral clustering method is pro- posed to overcome the shortcoming of the normal support vector machine algorithm, which divides the ...A new fuzzy support vector machine algorithm with dual membership values based on spectral clustering method is pro- posed to overcome the shortcoming of the normal support vector machine algorithm, which divides the training datasets into two absolutely exclusive classes in the binary classification, ignoring the possibility of "overlapping" region between the two training classes. The proposed method handles sample "overlap" effi- ciently with spectral clustering, overcoming the disadvantages of over-fitting well, and improving the data mining efficiency greatly. Simulation provides clear evidences to the new method.展开更多
鲁棒优化作为应对风电等新能源出力不确定性的重要工具,广泛应用于微电网优化调度中。传统的不确定集不够紧凑,无法准确刻画风电不确定性,同时不确定集包围的数据中可能存在部分异常值,导致调度结果过于保守。针对上述问题,提出了一种...鲁棒优化作为应对风电等新能源出力不确定性的重要工具,广泛应用于微电网优化调度中。传统的不确定集不够紧凑,无法准确刻画风电不确定性,同时不确定集包围的数据中可能存在部分异常值,导致调度结果过于保守。针对上述问题,提出了一种基于数据驱动不确定集的微电网两阶段鲁棒优化调度方法。首先,通过风电历史数据构建条件正态Copula(conditional normal copula,CNC)模型,再将日前风电预测值输入CNC模型生成次日风电功率样本。然后,通过支持向量聚类(support vector clustering,SVC)和维度分解构建考虑风电时间相关性的数据驱动不确定集。该不确定集可更为准确地刻画风电不确定性,并将风电数据中的异常值排除在外,从而在降低鲁棒优化保守性的同时具备异常值抵抗性。其次,提出了基于上述不确定集的两阶段鲁棒优化调度模型,并采用列约束生成(column and constraint generation,C&CG)算法求解。最后通过仿真证明了相较传统不确定集,本文构建的不确定集保守性更低,同时对风电数据异常值具有良好的抵抗性。展开更多
基金supported by Chiang Mai University Research Fund under the contract number T-M5744
文摘A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed.The proposed method clusters the output data into groups and clusters the input data in accordance with the groups of output data.Then,a set of prototypes are selected from the clustered input data.The inessential data can be ultimately discarded from the data set.The proposed method can reduce the effect from outliers because only the prototypes are used.This method is applied to reduce the data set in regression problems.Two standard synthetic data sets and three standard real-world data sets are used for evaluation.The root-mean-square errors are compared from support vector regression models trained with the original data sets and the corresponding instance-reduced data sets.From the experiments,the proposed method provides good results on the reduction and the reconstruction of the standard synthetic and real-world data sets.The numbers of instances of the synthetic data sets are decreased by 25%-69%.The reduction rates for the real-world data sets of the automobile miles per gallon and the 1990 census in CA are 46% and 57%,respectively.The reduction rate of 96% is very good for the electrocardiogram(ECG) data set because of the redundant and periodic nature of ECG signals.For all of the data sets,the regression results are similar to those from the corresponding original data sets.Therefore,the regression performance of the proposed method is good while only a fraction of the data is needed in the training process.
基金supported by the National Natural Science Foundation of China (7083100170821061)
文摘A new fuzzy support vector machine algorithm with dual membership values based on spectral clustering method is pro- posed to overcome the shortcoming of the normal support vector machine algorithm, which divides the training datasets into two absolutely exclusive classes in the binary classification, ignoring the possibility of "overlapping" region between the two training classes. The proposed method handles sample "overlap" effi- ciently with spectral clustering, overcoming the disadvantages of over-fitting well, and improving the data mining efficiency greatly. Simulation provides clear evidences to the new method.
文摘鲁棒优化作为应对风电等新能源出力不确定性的重要工具,广泛应用于微电网优化调度中。传统的不确定集不够紧凑,无法准确刻画风电不确定性,同时不确定集包围的数据中可能存在部分异常值,导致调度结果过于保守。针对上述问题,提出了一种基于数据驱动不确定集的微电网两阶段鲁棒优化调度方法。首先,通过风电历史数据构建条件正态Copula(conditional normal copula,CNC)模型,再将日前风电预测值输入CNC模型生成次日风电功率样本。然后,通过支持向量聚类(support vector clustering,SVC)和维度分解构建考虑风电时间相关性的数据驱动不确定集。该不确定集可更为准确地刻画风电不确定性,并将风电数据中的异常值排除在外,从而在降低鲁棒优化保守性的同时具备异常值抵抗性。其次,提出了基于上述不确定集的两阶段鲁棒优化调度模型,并采用列约束生成(column and constraint generation,C&CG)算法求解。最后通过仿真证明了相较传统不确定集,本文构建的不确定集保守性更低,同时对风电数据异常值具有良好的抵抗性。