This paper proposes a new method of Online Analytical Processing on EMBL Nucleotide SequencesDatabase. This scheme is used to automatically restore flat file data into relational database, which is then convertedinto ...This paper proposes a new method of Online Analytical Processing on EMBL Nucleotide SequencesDatabase. This scheme is used to automatically restore flat file data into relational database, which is then convertedinto OLAP's data marts. Both the quality and speed of analysis will be greatly improved based on the data marts. Webelieve that this method is a powerful and flexible tool and can be seen as successful application of data mining inmolecule Biology.展开更多
针对传统GSP算法需要多次扫描数据库、I/O开销巨大的缺点,提出了一种基于MapReduce编程框架的序列模式挖掘算法MR-GSP(GSP algorithm based on MapReduce)。MR-GSP算法将原序列数据库划分为多个子序列数据库并分发到多个Map节点,Map函...针对传统GSP算法需要多次扫描数据库、I/O开销巨大的缺点,提出了一种基于MapReduce编程框架的序列模式挖掘算法MR-GSP(GSP algorithm based on MapReduce)。MR-GSP算法将原序列数据库划分为多个子序列数据库并分发到多个Map节点,Map函数扫描存放在Map节点内存中的子序列数据库,产生局部序列模式,Reduce函数对所有局部序列模式合并,扫描原序列数据库,计算局部序列模式的支持度,得到最终的序列模式。相比于传统GSP算法,MR-GSP算法只需扫描两次原始数据库即可得到所有序列模式。实验结果表明,MRGSP算法在对大数据集进行序列模式挖掘时,可充分利用云计算技术的优势,提高挖掘效率。展开更多
文摘This paper proposes a new method of Online Analytical Processing on EMBL Nucleotide SequencesDatabase. This scheme is used to automatically restore flat file data into relational database, which is then convertedinto OLAP's data marts. Both the quality and speed of analysis will be greatly improved based on the data marts. Webelieve that this method is a powerful and flexible tool and can be seen as successful application of data mining inmolecule Biology.
文摘针对传统GSP算法需要多次扫描数据库、I/O开销巨大的缺点,提出了一种基于MapReduce编程框架的序列模式挖掘算法MR-GSP(GSP algorithm based on MapReduce)。MR-GSP算法将原序列数据库划分为多个子序列数据库并分发到多个Map节点,Map函数扫描存放在Map节点内存中的子序列数据库,产生局部序列模式,Reduce函数对所有局部序列模式合并,扫描原序列数据库,计算局部序列模式的支持度,得到最终的序列模式。相比于传统GSP算法,MR-GSP算法只需扫描两次原始数据库即可得到所有序列模式。实验结果表明,MRGSP算法在对大数据集进行序列模式挖掘时,可充分利用云计算技术的优势,提高挖掘效率。