对比模式挖掘是序列模式挖掘的一个重要分支,带有密度约束的对比模式有助于生物学家发现生物序列中的特殊因子的分布情况。为此,文中提出了MPDG(Mining distinguishing sequence Patterns based on Density and Gap constraint)算法,该...对比模式挖掘是序列模式挖掘的一个重要分支,带有密度约束的对比模式有助于生物学家发现生物序列中的特殊因子的分布情况。为此,文中提出了MPDG(Mining distinguishing sequence Patterns based on Density and Gap constraint)算法,该算法应用网树结构挖掘满足密度约束和间隙约束的对比模式,在仅需扫描一遍序列库的情况下,该算法可计算当前模式的所有超模式的支持度,从而提高挖掘效率。最后,在真实蛋白质数据集上进行实验,实验结果验证了MPDG算法的有效性。展开更多
序列模式挖掘是从序列数据中发现用户感兴趣的模式。对比模式挖掘是其中的一类挖掘方法,其特点是在两类或多类别的序列库中找到特征信息,在实际的生活和生产中应用十分广泛。随着数据规模的不断增加,算法的挖掘效率显得尤为重要,但是当...序列模式挖掘是从序列数据中发现用户感兴趣的模式。对比模式挖掘是其中的一类挖掘方法,其特点是在两类或多类别的序列库中找到特征信息,在实际的生活和生产中应用十分广泛。随着数据规模的不断增加,算法的挖掘效率显得尤为重要,但是当前对比模式挖掘仍存在挖掘速度太慢的问题。为了快速挖掘满足密度约束和间隙约束的对比模式,文中提出了一种近似求解算法ADMD(Approximately Distinguishing Patterns Mining Based on Density Constraint),该算法在模式的挖掘过程中允许存在小部分的模式丢失,从而换取挖掘速度的大幅提升。该算法采用网树的特殊结构来计算模式的支持数;采用模式拼接的方式来生成候选模式;采用预判式剪枝策略对模式进行剪枝,以避免大量冗余模式的生成。但由于在剪枝过程中可能会剪掉一部分非冗余模式,造成挖掘结果并非完备,因此该算法是一种近似求解算法。在ADMD算法的基础上,通过在剪枝策略中设定参数k的方式来得到ADMD-k算法,该算法可以通过设定k的取值来调整剪枝程度,从而在挖掘效率和准确率方面取得平衡。最后在真实的蛋白质数据集上将所提算法与其他算法从挖掘的对比模式数量和挖掘速度方面进行对比实验。实验结果表明,在k=1.5的情况下,所提算法仅用不到原来13%的时间,就可以挖掘到99%以上的模式,具有近似度高、速度快的特点。展开更多
In order to accurately and quickly identify the safety status pattern of coalmines,a new safety status pattern recognition method based on the extension neural network (ENN) was proposed,and the design of structure of...In order to accurately and quickly identify the safety status pattern of coalmines,a new safety status pattern recognition method based on the extension neural network (ENN) was proposed,and the design of structure of network,the rationale of recognition algorithm and the performance of proposed method were discussed in detail.The safety status pattern recognition problem of coalmines can be regard as a classification problem whose features are defined in a range,so using the ENN is most appropriate for this problem.The ENN-based recognition method can use a novel extension distance to measure the similarity between the object to be recognized and the class centers.To demonstrate the effectiveness of the proposed method,a real-world application on the geological safety status pattern recognition of coalmines was tested.Comparative experiments with existing method and other traditional ANN-based methods were conducted.The experimental results show that the proposed ENN-based recognition method can identify the safety status pattern of coalmines accurately with shorter learning time and simpler structure.The experimental results also confirm that the proposed method has a better performance in recognition accuracy,generalization ability and fault-tolerant ability,which are very useful in recognizing the safety status pattern in the process of coal production.展开更多
文摘对比模式挖掘是序列模式挖掘的一个重要分支,带有密度约束的对比模式有助于生物学家发现生物序列中的特殊因子的分布情况。为此,文中提出了MPDG(Mining distinguishing sequence Patterns based on Density and Gap constraint)算法,该算法应用网树结构挖掘满足密度约束和间隙约束的对比模式,在仅需扫描一遍序列库的情况下,该算法可计算当前模式的所有超模式的支持度,从而提高挖掘效率。最后,在真实蛋白质数据集上进行实验,实验结果验证了MPDG算法的有效性。
文摘序列模式挖掘是从序列数据中发现用户感兴趣的模式。对比模式挖掘是其中的一类挖掘方法,其特点是在两类或多类别的序列库中找到特征信息,在实际的生活和生产中应用十分广泛。随着数据规模的不断增加,算法的挖掘效率显得尤为重要,但是当前对比模式挖掘仍存在挖掘速度太慢的问题。为了快速挖掘满足密度约束和间隙约束的对比模式,文中提出了一种近似求解算法ADMD(Approximately Distinguishing Patterns Mining Based on Density Constraint),该算法在模式的挖掘过程中允许存在小部分的模式丢失,从而换取挖掘速度的大幅提升。该算法采用网树的特殊结构来计算模式的支持数;采用模式拼接的方式来生成候选模式;采用预判式剪枝策略对模式进行剪枝,以避免大量冗余模式的生成。但由于在剪枝过程中可能会剪掉一部分非冗余模式,造成挖掘结果并非完备,因此该算法是一种近似求解算法。在ADMD算法的基础上,通过在剪枝策略中设定参数k的方式来得到ADMD-k算法,该算法可以通过设定k的取值来调整剪枝程度,从而在挖掘效率和准确率方面取得平衡。最后在真实的蛋白质数据集上将所提算法与其他算法从挖掘的对比模式数量和挖掘速度方面进行对比实验。实验结果表明,在k=1.5的情况下,所提算法仅用不到原来13%的时间,就可以挖掘到99%以上的模式,具有近似度高、速度快的特点。
基金Project(107021) supported by the Key Foundation of Chinese Ministry of Education Project(2009643013) supported by China Scholarship Fund
文摘In order to accurately and quickly identify the safety status pattern of coalmines,a new safety status pattern recognition method based on the extension neural network (ENN) was proposed,and the design of structure of network,the rationale of recognition algorithm and the performance of proposed method were discussed in detail.The safety status pattern recognition problem of coalmines can be regard as a classification problem whose features are defined in a range,so using the ENN is most appropriate for this problem.The ENN-based recognition method can use a novel extension distance to measure the similarity between the object to be recognized and the class centers.To demonstrate the effectiveness of the proposed method,a real-world application on the geological safety status pattern recognition of coalmines was tested.Comparative experiments with existing method and other traditional ANN-based methods were conducted.The experimental results show that the proposed ENN-based recognition method can identify the safety status pattern of coalmines accurately with shorter learning time and simpler structure.The experimental results also confirm that the proposed method has a better performance in recognition accuracy,generalization ability and fault-tolerant ability,which are very useful in recognizing the safety status pattern in the process of coal production.