摘要
数据是重要的战略资源,大数据挖掘技术已成为学术界、企业界甚至各国政府关注的热点.本文介绍了大数据的基本概念及发展现状,综述了与化学研究有关的大数据研究状况,讨论了大数据在基础理论与关键技术2个层面上的主要问题以及大数据挖掘技术在化学各领域中的应用,并对大数据发展的未来及其在化学学科中的应用前景进行了展望.
Big data is fast becoming an important resource and a hot topic in academic research, business and government. In this paper, we introduce the concept of big data, and review advances in big data research, including technology for big data collection, cloud computing technology like Google's file system, BigTable, MapReduce and Hadoop, and data mining and visualization methods for big data. Big data are commonly defined by the so-called 4 V's, i.e., volume, variety, velocity, and value. High volume data with large variety make the analysis of big data much more difficult. Since velocity is important, fast high performance analysis methods are needed for big data. Moreover, the high value of big data is precisely the reason for the importance of and research activity in this area. In this paper, we also summarize various applications of big data in chemistry. Professional information platforms like the Collaboratory for Multi-scale Chemical Sciences (CMCS) and Chemical Informatics and Cyberinfrastructure Collaboratory (CICC) have been developed to manage and research chemical big data, while search engines like the ChemDB Portal have been established to extract chemical information from the internet. Software like the Integrated Project View and ArQiologist can be used to assist in the design of new medicines in medicinal chemistry. A data management system called BioGames has been proposed to analyze microfluidics big data. Moreover, graphics processing units are widely used to improve the computational capabilities of molecular dynamics simulations, while compressed score plots have been proposed to solve visualization issues in the field of chemometrics. In the era of big data, the analytical instruments, chemical data systems, and even the research methods may need to be changed and therefore, new strategies and techniques are still needed for the generation and processing of big data.
出处
《科学通报》
EI
CAS
CSCD
北大核心
2015年第8期694-703,共10页
Chinese Science Bulletin
关键词
大数据
数据挖掘
可视化
云计算
化学
big data, data mining, visualization, cloud computing, chemistry
作者简介
E-mail:xshao@nankai.edu.cn