由于人类DNA序列上单核苷酸具有多态性,DNA序列异常挖掘是后基因组时代的一个重要研究课题。文章在分析现有DNA序列数据挖掘方法的基础上,利用流形学习中不同低维嵌入向量之间向量距离不同的特点,提出了基于流形学习的DNA序列数据挖掘方...由于人类DNA序列上单核苷酸具有多态性,DNA序列异常挖掘是后基因组时代的一个重要研究课题。文章在分析现有DNA序列数据挖掘方法的基础上,利用流形学习中不同低维嵌入向量之间向量距离不同的特点,提出了基于流形学习的DNA序列数据挖掘方法(5Dlocally linear embedding,简称5DLLE)。实验结果表明,与隐马尔可夫模型(HMM)和支持向量机(SVM)相比,文中所提出的5DLLE方法在DNA序列数据挖掘方面具有一定优势,不但平均识别率高,而且计算时间相对较少。展开更多
The detection of outliers and change points from time series has become research focus in the area of time series data mining since it can be used for fraud detection, rare event discovery, event/trend change detectio...The detection of outliers and change points from time series has become research focus in the area of time series data mining since it can be used for fraud detection, rare event discovery, event/trend change detection, etc. In most previous works, outlier detection and change point detection have not been related explicitly and the change point detections did not consider the influence of outliers, in this work, a unified detection framework was presented to deal with both of them. The framework is based on ALARCON-AQUINO and BARRIA's change points detection method and adopts two-stage detection to divide the outliers and change points. The advantages of it lie in that: firstly, unified structure for change detection and outlier detection further reduces the computational complexity and make the detective procedure simple; Secondly, the detection strategy of outlier detection before change point detection avoids the influence of outliers to the change point detection, and thus improves the accuracy of the change point detection. The simulation experiments of the proposed method for both model data and actual application data have been made and gotten 100% detection accuracy. The comparisons between traditional detection method and the proposed method further demonstrate that the unified detection structure is more accurate when the time series are contaminated by outliers.展开更多
文摘由于人类DNA序列上单核苷酸具有多态性,DNA序列异常挖掘是后基因组时代的一个重要研究课题。文章在分析现有DNA序列数据挖掘方法的基础上,利用流形学习中不同低维嵌入向量之间向量距离不同的特点,提出了基于流形学习的DNA序列数据挖掘方法(5Dlocally linear embedding,简称5DLLE)。实验结果表明,与隐马尔可夫模型(HMM)和支持向量机(SVM)相比,文中所提出的5DLLE方法在DNA序列数据挖掘方面具有一定优势,不但平均识别率高,而且计算时间相对较少。
基金Project(2011AA040603) supported by the National High Technology Ressarch & Development Program of ChinaProject(201202226) supported by the Natural Science Foundation of Liaoning Province, China
文摘The detection of outliers and change points from time series has become research focus in the area of time series data mining since it can be used for fraud detection, rare event discovery, event/trend change detection, etc. In most previous works, outlier detection and change point detection have not been related explicitly and the change point detections did not consider the influence of outliers, in this work, a unified detection framework was presented to deal with both of them. The framework is based on ALARCON-AQUINO and BARRIA's change points detection method and adopts two-stage detection to divide the outliers and change points. The advantages of it lie in that: firstly, unified structure for change detection and outlier detection further reduces the computational complexity and make the detective procedure simple; Secondly, the detection strategy of outlier detection before change point detection avoids the influence of outliers to the change point detection, and thus improves the accuracy of the change point detection. The simulation experiments of the proposed method for both model data and actual application data have been made and gotten 100% detection accuracy. The comparisons between traditional detection method and the proposed method further demonstrate that the unified detection structure is more accurate when the time series are contaminated by outliers.