Objective To explore the medication rules of traditional Chinese medicine(TCM)and mechanism of action of hub herb pairs for treating insomnia.Methods Totally 104 prescriptions were statistically analyzed.The associati...Objective To explore the medication rules of traditional Chinese medicine(TCM)and mechanism of action of hub herb pairs for treating insomnia.Methods Totally 104 prescriptions were statistically analyzed.The association rule algorithm was applied to mine the hub herb pairs.Network pharmacology was utilized to analyze the mechanism of the hub herb pairs,while molecular docking was applied to simulate the interaction between receptors and herb molecules,thereby predicting their binding affinities.Results The most frequently used herbs in TCM prescriptions for treating insomnia included Semen Ziziphi Spinosae,Radix Glycyrrhizae,Radix et Rhizoma Ginseng,and Poria cum Radix Pini.Among them,the most commonly used were the supplementing herbs,followed by heat-clearing,mind-calming,and exterior-releasing ones,with their properties of warm and cold,flavors of sweet,Pungent,and bitter,and meridian tropisms of liver,lungs,spleen,kidneys,heart,and stomach.The hub herb pairs based on the association rules included Radix Astragali-Radix et Rhizoma Ginseng,Rhizoma Chuanxiong-Radix Glycyrrhizae,Seman Platycladi-Semen Ziziphi Spinosae,Pericarpium Citri Reticulatae-Radix Glycyrrhizae,Radix Polygalae-Semen Ziziphi Spinosae,and Radix Astragali-Semen Ziziphi Spinosae.Network pharmacology revealed that the cAMP signaling pathway might play a key role in treating insomnia synergistically with HIF-1 signaling pathway,prolactin signaling pathway,chemical carcinogenesis receptor activation,and PI3K-Akt signaling pathway.Molecular docking indicated that there was good binding between the active ingredients of the hub herb pairs and the hub targets.Conclusions This study identified six hub herb pairs for treating insomnia in TCM.These hub herb pairs may synergistically treat insomnia with HIF-1 signaling pathway,prolactin signaling pathway,chemical carcinogenesis receptor activation,and PI3K-Akt signaling pathway through the cAMP signaling pathway.展开更多
An intrusion detection (ID) model is proposed based on the fuzzy data mining method. A major difficulty of anomaly ID is that patterns of the normal behavior change with time. In addition, an actual intrusion with a...An intrusion detection (ID) model is proposed based on the fuzzy data mining method. A major difficulty of anomaly ID is that patterns of the normal behavior change with time. In addition, an actual intrusion with a small deviation may match normal patterns. So the intrusion behavior cannot be detected by the detection system.To solve the problem, fuzzy data mining technique is utilized to extract patterns representing the normal behavior of a network. A set of fuzzy association rules mined from the network data are shown as a model of “normal behaviors”. To detect anomalous behaviors, fuzzy association rules are generated from new audit data and the similarity with sets mined from “normal” data is computed. If the similarity values are lower than a threshold value,an alarm is given. Furthermore, genetic algorithms are used to adjust the fuzzy membership functions and to select an appropriate set of features.展开更多
The existing data mining methods are mostly focused on relational databases and structured data, but not on complex structured data (like in extensible markup language(XML)). By converting XML document type descriptio...The existing data mining methods are mostly focused on relational databases and structured data, but not on complex structured data (like in extensible markup language(XML)). By converting XML document type description to the relational semantic recording XML data relations, and using an XML data mining language, the XML data mining system presents a strategy to mine information on XML.展开更多
Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical I...Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical Information Mart for Intensive Care(MIMIC);however,these data are often characterized by a high degree of dimensional heterogeneity,timeliness,scarcity,irregularity,and other characteristics,resulting in the value of these data not being fully utilized.Data-mining technology has been a frontier field in medical research,as it demonstrates excellent performance in evaluating patient risks and assisting clinical decision-making in building disease-prediction models.Therefore,data mining has unique advantages in clinical big-data research,especially in large-scale medical public databases.This article introduced the main medical public database and described the steps,tasks,and models of data mining in simple language.Additionally,we described data-mining methods along with their practical applications.The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.展开更多
In computational physics proton transfer phenomena could be viewed as pattern classification problems based on a set of input features allowing classification of the proton motion into two categories: transfer 'occu...In computational physics proton transfer phenomena could be viewed as pattern classification problems based on a set of input features allowing classification of the proton motion into two categories: transfer 'occurred' and transfer 'not occurred'. The goal of this paper is to evaluate the use of artificial neural networks in the classification of proton transfer events, based on the feed-forward back propagation neural network, used as a classifier to distinguish between the two transfer cases. In this paper, we use a new developed data mining and pattern recognition tool for automating, controlling, and drawing charts of the output data of an Empirical Valence Bond existing code. The study analyzes the need for pattern recognition in aqueous proton transfer processes and how the learning approach in error back propagation (multilayer perceptron algorithms) could be satisfactorily employed in the present case. We present a tool for pattern recognition and validate the code including a real physical case study. The results of applying the artificial neural networks methodology to crowd patterns based upon selected physical properties (e.g., temperature, density) show the abilities of the network to learn proton transfer patterns corresponding to properties of the aqueous environments, which is in turn proved to be fully compatible with previous proton transfer studies.展开更多
Traditional distribution network planning relies on the professional knowledge of planners,especially when analyzing the correlations between the problems existing in the network and the crucial influencing factors.Th...Traditional distribution network planning relies on the professional knowledge of planners,especially when analyzing the correlations between the problems existing in the network and the crucial influencing factors.The inherent laws reflected by the historical data of the distribution network are ignored,which affects the objectivity of the planning scheme.In this study,to improve the efficiency and accuracy of distribution network planning,the characteristics of distribution network data were extracted using a data-mining technique,and correlation knowledge of existing problems in the network was obtained.A data-mining model based on correlation rules was established.The inputs of the model were the electrical characteristic indices screened using the gray correlation method.The Apriori algorithm was used to extract correlation knowledge from the operational data of the distribution network and obtain strong correlation rules.Degree of promotion and chi-square tests were used to verify the rationality of the strong correlation rules of the model output.In this study,the correlation relationship between heavy load or overload problems of distribution network feeders in different regions and related characteristic indices was determined,and the confidence of the correlation rules was obtained.These results can provide an effective basis for the formulation of a distribution network planning scheme.展开更多
This paper presents a generalized method for updating approximations of a concept incrementally, which can be used as an effective tool to deal with dynamic attribute generalization. By combining this method and the L...This paper presents a generalized method for updating approximations of a concept incrementally, which can be used as an effective tool to deal with dynamic attribute generalization. By combining this method and the LERS inductive learning algorithm, it also introduces a generalized quasi incremental algorithm for learning classification rules from data bases.展开更多
Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data...Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data mining are to discover knowledge of interest to user needs.Data mining is really a useful tool in many domains such as marketing, decision making, etc. However, some basic issues of data mining are ignored. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? Is there any rule we should obey in a data mining process? In order to discover patterns and knowledge really interesting and actionable to the real world Zhang et al proposed a domain-driven human-machine-cooperated data mining process.Zhao and Yao proposed an interactive user-driven classification method using the granule network. In our work, we find that data mining is a kind of knowledge transforming process to transform knowledge from data format into symbol format. Thus, no new knowledge could be generated (born) in a data mining process. In a data mining process, knowledge is just transformed from data format, which is not understandable for human, into symbol format,which is understandable for human and easy to be used.It is similar to the process of translating a book from Chinese into English.In this translating process,the knowledge itself in the book should remain unchanged. What will be changed is the format of the knowledge only. That is, the knowledge in the English book should be kept the same as the knowledge in the Chinese one.Otherwise, there must be some mistakes in the translating proces, that is, we are transforming knowledge from one format into another format while not producing new knowledge in a data mining process. The knowledge is originally stored in data (data is a representation format of knowledge). Unfortunately, we can not read, understand, or use it, since we can not understand data. With this understanding of data mining, we proposed a data-driven knowledge acquisition method based on rough sets. It also improved the performance of classical knowledge acquisition methods. In fact, we also find that the domain-driven data mining and user-driven data mining do not conflict with our data-driven data mining. They could be integrated into domain-oriented data-driven data mining. It is just like the views of data base. Users with different views could look at different partial data of a data base. Thus, users with different tasks or objectives wish, or could discover different knowledge (partial knowledge) from the same data base. However, all these partial knowledge should be originally existed in the data base. So, a domain-oriented data-driven data mining method would help us to extract the knowledge which is really existed in a data base, and really interesting and actionable to the real world.展开更多
To solve information asymmetry problem on online auction, this study suggests and validates a forecasting model of winning bid prices. Especially, it explores the usability of data mining approaches, such as neural ne...To solve information asymmetry problem on online auction, this study suggests and validates a forecasting model of winning bid prices. Especially, it explores the usability of data mining approaches, such as neural network and Bayesian network in building a forecasting model. This research empirically shows that, in forecasting winning bid prices on online auction, data mining techniques have shown better performance than traditional statistical analysis, such as logistic regression and multivariate regression.展开更多
An agent-based data mining framework for the high-dimensional environment is built instead of the style of classical structural programming or the object-oriented programming. The framework supports the whole process ...An agent-based data mining framework for the high-dimensional environment is built instead of the style of classical structural programming or the object-oriented programming. The framework supports the whole process of data mining of the high-dimensional environment. Belief-desire-joint intention agents are designed to fit the characteristic of the high-dimensional environment. At the same time, the syntax, semantics and reasoning rules of the agents are given. In the data mining system of the high-dimensional environment, agents need exchange messages. The cooperation behavior mechanism is adopted to complete the communication through the three-level pattern among agents that have their own fixed roles.展开更多
In recent years, the telecommunications have used the concept of NPS(Net Promoter Score) for customer relationship management, but there is neither definite theory research nor instructive instance research. However, ...In recent years, the telecommunications have used the concept of NPS(Net Promoter Score) for customer relationship management, but there is neither definite theory research nor instructive instance research. However, this paper summarizes an approach with instance case analysis to improve customer loyalty via NPS data mining, which has extensive and practical significance for tele-companies. First, this paper finds some driven forces of customer loyalty, which are relative to customer consumption such as the call duration, the usage of data, ARPU, etc., by using some innovative reasoning-analysis based on IG(Information Gain) and xg-boost decision-making tree model, so the tele-companies can predict the role of individual customer and form daily monitoring on big data, which will save a lot of NPS survey cost. Second, this paper summarizes how customer group feature impacts the relationship between NPS and financial performance. Taking ARPU value as the performance goals, we divide the sample customers into 6 groups and summarize their characteristics based on k-means clustering, and give targeted suggestion of each group.展开更多
The outbreak of coronavirus disease 2019(COVID-2019)has drawn public attention all over the world.As a newly emerging area,single cell sequencing also exerts its power in the battle over the epidemic.In this review,th...The outbreak of coronavirus disease 2019(COVID-2019)has drawn public attention all over the world.As a newly emerging area,single cell sequencing also exerts its power in the battle over the epidemic.In this review,the up-to-date knowledge of COVID-19 and its receptor is summarized,followed by a collection of the mining of single cell transcriptome profiling data for the information in aspects of the vulnerable cell types in humans and the potential mechanisms of the disease.展开更多
Travelling Salesman Problem(TSP) is a classical optimization problem and it is one of a class of NP-Problem.The purposes of this work is to apply data mining methodologies to explore the patterns in data generated by ...Travelling Salesman Problem(TSP) is a classical optimization problem and it is one of a class of NP-Problem.The purposes of this work is to apply data mining methodologies to explore the patterns in data generated by an Ant Colony Algorithm(ACA) performing a searching operation and to develop a rule set searcher which approximates the ACA′s searcher.An attribute-oriented induction methodology was used to explore the relationship between an operations′ sequence and its attributes and a set of rules has been developed.At the end of this paper,the experimental results have shown that the proposed approach has good performance with respect to the quality of solution and the speed of computation.展开更多
The safe production of coalmine can be further improved by forecasting the quantity of gas emission based on the real-time data and historical data which the gas monitoring system has saved. By making use of the advan...The safe production of coalmine can be further improved by forecasting the quantity of gas emission based on the real-time data and historical data which the gas monitoring system has saved. By making use of the advantages of data warehouse and data mining technology for processing large quantity of redundancy data, the method and its application of forecasting mine gas emission quantity based on FDM were studied. The constructing fuzzy resembling relation and clustering analysis were proposed, which the potential relationship inside the gas emission data may be found. The mode finds model and forecast model were presented, and the detailed approach to realize this forecast was also proposed, which have been applied to forecast the gas emission quantity efficiently.展开更多
Magnesium(Mg)is a promising alternative to lithium(Li)as an anode material in solid-state batteries due to its abundance and high theoretical volumetric capacity.However,the sluggish Mg-ion conduction in the lattice o...Magnesium(Mg)is a promising alternative to lithium(Li)as an anode material in solid-state batteries due to its abundance and high theoretical volumetric capacity.However,the sluggish Mg-ion conduction in the lattice of solidstate electrolytes(SSEs)is one of the key challenges that hamper the development of Mg-ion solid-state batteries.Though various Mg-ion SSEs have been reported in recent years,key insights are hard to be derived from a single literature report.Besides,the structure-performance relationships of Mg-ion SSEs need to be further unraveled to provide a more precise design guideline for SSEs.In this viewpoint article,we analyze the structural characteristics of the Mg-based SSEs with high ionic conductivity reported in the last four decades based upon data mining-we provide big-data-derived insights into the challenges and opportunities in developing next-generation Mg-ion SSEs.展开更多
Facing the development of future 5 G, the emerging technologies such as Internet of things, big data, cloud computing, and artificial intelligence is enhancing an explosive growth in data traffic. Radical changes in c...Facing the development of future 5 G, the emerging technologies such as Internet of things, big data, cloud computing, and artificial intelligence is enhancing an explosive growth in data traffic. Radical changes in communication theory and implement technologies, the wireless communications and wireless networks have entered a new era. Among them, wireless big data(WBD) has tremendous value, and artificial intelligence(AI) gives unthinkable possibilities. However, in the big data development and artificial intelligence application groups, the lack of a sound theoretical foundation and mathematical methods is regarded as a real challenge that needs to be solved. From the basic problem of wireless communication, the interrelationship of demand, environment and ability, this paper intends to investigate the concept and data model of WBD, the wireless data mining, the wireless knowledge and wireless knowledge learning(WKL), and typical practices examples, to facilitate and open up more opportunities of WBD research and developments. Such research is beneficial for creating new theoretical foundation and emerging technologies of future wireless communications.展开更多
The distance-based outlier detection method detects the implied outliers by calculating the distance of the points in the dataset, but the computational complexity is particularly high when processing multidimensional...The distance-based outlier detection method detects the implied outliers by calculating the distance of the points in the dataset, but the computational complexity is particularly high when processing multidimensional datasets. In addition, the traditional outlier detection method does not consider the frequency of subsets occurrence, thus, the detected outliers do not fit the definition of outliers (i.e., rarely appearing). The pattern mining-based outlier detection approaches have solved this problem, but the importance of each pattern is not taken into account in outlier detection process, so the detected outliers cannot truly reflect some actual situation. Aimed at these problems, a two-phase minimal weighted rare pattern mining-based outlier detection approach, called MWRPM-Outlier, is proposed to effectively detect outliers on the weight data stream. In particular, a method called MWRPM is proposed in the pattern mining phase to fast mine the minimal weighted rare patterns, and then two deviation factors are defined in outlier detection phase to measure the abnormal degree of each transaction on the weight data stream. Experimental results show that the proposed MWRPM-Outlier approach has excellent performance in outlier detection and MWRPM approach outperforms in weighted rare pattern mining.展开更多
With the increasing deployment of wireless sensordevices and networks,security becomes a criticalchallenge for sensor networks.In this paper,a schemeusing data mining is proposed for routing anomalydetection in wirele...With the increasing deployment of wireless sensordevices and networks,security becomes a criticalchallenge for sensor networks.In this paper,a schemeusing data mining is proposed for routing anomalydetection in wireless sensor networks.The schemeuses the Apriori algorithm to extract traffic patternsfrom both routing table and network traffic packetsand subsequently the K-means cluster algorithmadaptively generates a detection model.Through thecombination of these two algorithms,routing attackscan be detected effectively and automatically.Themain advantage of the proposed approach is that it isable to detect new attacks that have not previouslybeen seen.Moreover,the proposed detection schemeis based on no priori knowledge and then can beapplied to a wide range of different sensor networksfor a variety of routing attacks.展开更多
基金National Natural Science Foundation of China(82360905)Gansu Provincial University Teachers'Innovation Fund Projects(2023A-092 and 2024B-109).
文摘Objective To explore the medication rules of traditional Chinese medicine(TCM)and mechanism of action of hub herb pairs for treating insomnia.Methods Totally 104 prescriptions were statistically analyzed.The association rule algorithm was applied to mine the hub herb pairs.Network pharmacology was utilized to analyze the mechanism of the hub herb pairs,while molecular docking was applied to simulate the interaction between receptors and herb molecules,thereby predicting their binding affinities.Results The most frequently used herbs in TCM prescriptions for treating insomnia included Semen Ziziphi Spinosae,Radix Glycyrrhizae,Radix et Rhizoma Ginseng,and Poria cum Radix Pini.Among them,the most commonly used were the supplementing herbs,followed by heat-clearing,mind-calming,and exterior-releasing ones,with their properties of warm and cold,flavors of sweet,Pungent,and bitter,and meridian tropisms of liver,lungs,spleen,kidneys,heart,and stomach.The hub herb pairs based on the association rules included Radix Astragali-Radix et Rhizoma Ginseng,Rhizoma Chuanxiong-Radix Glycyrrhizae,Seman Platycladi-Semen Ziziphi Spinosae,Pericarpium Citri Reticulatae-Radix Glycyrrhizae,Radix Polygalae-Semen Ziziphi Spinosae,and Radix Astragali-Semen Ziziphi Spinosae.Network pharmacology revealed that the cAMP signaling pathway might play a key role in treating insomnia synergistically with HIF-1 signaling pathway,prolactin signaling pathway,chemical carcinogenesis receptor activation,and PI3K-Akt signaling pathway.Molecular docking indicated that there was good binding between the active ingredients of the hub herb pairs and the hub targets.Conclusions This study identified six hub herb pairs for treating insomnia in TCM.These hub herb pairs may synergistically treat insomnia with HIF-1 signaling pathway,prolactin signaling pathway,chemical carcinogenesis receptor activation,and PI3K-Akt signaling pathway through the cAMP signaling pathway.
文摘An intrusion detection (ID) model is proposed based on the fuzzy data mining method. A major difficulty of anomaly ID is that patterns of the normal behavior change with time. In addition, an actual intrusion with a small deviation may match normal patterns. So the intrusion behavior cannot be detected by the detection system.To solve the problem, fuzzy data mining technique is utilized to extract patterns representing the normal behavior of a network. A set of fuzzy association rules mined from the network data are shown as a model of “normal behaviors”. To detect anomalous behaviors, fuzzy association rules are generated from new audit data and the similarity with sets mined from “normal” data is computed. If the similarity values are lower than a threshold value,an alarm is given. Furthermore, genetic algorithms are used to adjust the fuzzy membership functions and to select an appropriate set of features.
文摘The existing data mining methods are mostly focused on relational databases and structured data, but not on complex structured data (like in extensible markup language(XML)). By converting XML document type description to the relational semantic recording XML data relations, and using an XML data mining language, the XML data mining system presents a strategy to mine information on XML.
基金the National Social Science Foundation of China(No.16BGL183).
文摘Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical Information Mart for Intensive Care(MIMIC);however,these data are often characterized by a high degree of dimensional heterogeneity,timeliness,scarcity,irregularity,and other characteristics,resulting in the value of these data not being fully utilized.Data-mining technology has been a frontier field in medical research,as it demonstrates excellent performance in evaluating patient risks and assisting clinical decision-making in building disease-prediction models.Therefore,data mining has unique advantages in clinical big-data research,especially in large-scale medical public databases.This article introduced the main medical public database and described the steps,tasks,and models of data mining in simple language.Additionally,we described data-mining methods along with their practical applications.The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.
基金Dr. Steve Jones, Scientific Advisor of the Canon Foundation for Scientific Research (7200 The Quorum, Oxford Business Park, Oxford OX4 2JZ, England). Canon Foundation for Scientific Research funded the UPC 2013 tuition fees of the corresponding author during her writing this article
文摘In computational physics proton transfer phenomena could be viewed as pattern classification problems based on a set of input features allowing classification of the proton motion into two categories: transfer 'occurred' and transfer 'not occurred'. The goal of this paper is to evaluate the use of artificial neural networks in the classification of proton transfer events, based on the feed-forward back propagation neural network, used as a classifier to distinguish between the two transfer cases. In this paper, we use a new developed data mining and pattern recognition tool for automating, controlling, and drawing charts of the output data of an Empirical Valence Bond existing code. The study analyzes the need for pattern recognition in aqueous proton transfer processes and how the learning approach in error back propagation (multilayer perceptron algorithms) could be satisfactorily employed in the present case. We present a tool for pattern recognition and validate the code including a real physical case study. The results of applying the artificial neural networks methodology to crowd patterns based upon selected physical properties (e.g., temperature, density) show the abilities of the network to learn proton transfer patterns corresponding to properties of the aqueous environments, which is in turn proved to be fully compatible with previous proton transfer studies.
基金supported by the Science and Technology Project of China Southern Power Grid(GZHKJXM20210043-080041KK52210002).
文摘Traditional distribution network planning relies on the professional knowledge of planners,especially when analyzing the correlations between the problems existing in the network and the crucial influencing factors.The inherent laws reflected by the historical data of the distribution network are ignored,which affects the objectivity of the planning scheme.In this study,to improve the efficiency and accuracy of distribution network planning,the characteristics of distribution network data were extracted using a data-mining technique,and correlation knowledge of existing problems in the network was obtained.A data-mining model based on correlation rules was established.The inputs of the model were the electrical characteristic indices screened using the gray correlation method.The Apriori algorithm was used to extract correlation knowledge from the operational data of the distribution network and obtain strong correlation rules.Degree of promotion and chi-square tests were used to verify the rationality of the strong correlation rules of the model output.In this study,the correlation relationship between heavy load or overload problems of distribution network feeders in different regions and related characteristic indices was determined,and the confidence of the correlation rules was obtained.These results can provide an effective basis for the formulation of a distribution network planning scheme.
文摘This paper presents a generalized method for updating approximations of a concept incrementally, which can be used as an effective tool to deal with dynamic attribute generalization. By combining this method and the LERS inductive learning algorithm, it also introduces a generalized quasi incremental algorithm for learning classification rules from data bases.
文摘Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data mining are to discover knowledge of interest to user needs.Data mining is really a useful tool in many domains such as marketing, decision making, etc. However, some basic issues of data mining are ignored. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? Is there any rule we should obey in a data mining process? In order to discover patterns and knowledge really interesting and actionable to the real world Zhang et al proposed a domain-driven human-machine-cooperated data mining process.Zhao and Yao proposed an interactive user-driven classification method using the granule network. In our work, we find that data mining is a kind of knowledge transforming process to transform knowledge from data format into symbol format. Thus, no new knowledge could be generated (born) in a data mining process. In a data mining process, knowledge is just transformed from data format, which is not understandable for human, into symbol format,which is understandable for human and easy to be used.It is similar to the process of translating a book from Chinese into English.In this translating process,the knowledge itself in the book should remain unchanged. What will be changed is the format of the knowledge only. That is, the knowledge in the English book should be kept the same as the knowledge in the Chinese one.Otherwise, there must be some mistakes in the translating proces, that is, we are transforming knowledge from one format into another format while not producing new knowledge in a data mining process. The knowledge is originally stored in data (data is a representation format of knowledge). Unfortunately, we can not read, understand, or use it, since we can not understand data. With this understanding of data mining, we proposed a data-driven knowledge acquisition method based on rough sets. It also improved the performance of classical knowledge acquisition methods. In fact, we also find that the domain-driven data mining and user-driven data mining do not conflict with our data-driven data mining. They could be integrated into domain-oriented data-driven data mining. It is just like the views of data base. Users with different views could look at different partial data of a data base. Thus, users with different tasks or objectives wish, or could discover different knowledge (partial knowledge) from the same data base. However, all these partial knowledge should be originally existed in the data base. So, a domain-oriented data-driven data mining method would help us to extract the knowledge which is really existed in a data base, and really interesting and actionable to the real world.
文摘To solve information asymmetry problem on online auction, this study suggests and validates a forecasting model of winning bid prices. Especially, it explores the usability of data mining approaches, such as neural network and Bayesian network in building a forecasting model. This research empirically shows that, in forecasting winning bid prices on online auction, data mining techniques have shown better performance than traditional statistical analysis, such as logistic regression and multivariate regression.
文摘An agent-based data mining framework for the high-dimensional environment is built instead of the style of classical structural programming or the object-oriented programming. The framework supports the whole process of data mining of the high-dimensional environment. Belief-desire-joint intention agents are designed to fit the characteristic of the high-dimensional environment. At the same time, the syntax, semantics and reasoning rules of the agents are given. In the data mining system of the high-dimensional environment, agents need exchange messages. The cooperation behavior mechanism is adopted to complete the communication through the three-level pattern among agents that have their own fixed roles.
基金Supported by Humanities and Social Sciences Foundation of Ministry of Education in China (Project No. 16YJA630063)
文摘In recent years, the telecommunications have used the concept of NPS(Net Promoter Score) for customer relationship management, but there is neither definite theory research nor instructive instance research. However, this paper summarizes an approach with instance case analysis to improve customer loyalty via NPS data mining, which has extensive and practical significance for tele-companies. First, this paper finds some driven forces of customer loyalty, which are relative to customer consumption such as the call duration, the usage of data, ARPU, etc., by using some innovative reasoning-analysis based on IG(Information Gain) and xg-boost decision-making tree model, so the tele-companies can predict the role of individual customer and form daily monitoring on big data, which will save a lot of NPS survey cost. Second, this paper summarizes how customer group feature impacts the relationship between NPS and financial performance. Taking ARPU value as the performance goals, we divide the sample customers into 6 groups and summarize their characteristics based on k-means clustering, and give targeted suggestion of each group.
基金the National Key R&D Program of China under Grant No.2018YFC0910405the National Natural Science Foundation of China under Grants No.61922020,No.61771331,and No.91935302.
文摘The outbreak of coronavirus disease 2019(COVID-2019)has drawn public attention all over the world.As a newly emerging area,single cell sequencing also exerts its power in the battle over the epidemic.In this review,the up-to-date knowledge of COVID-19 and its receptor is summarized,followed by a collection of the mining of single cell transcriptome profiling data for the information in aspects of the vulnerable cell types in humans and the potential mechanisms of the disease.
文摘Travelling Salesman Problem(TSP) is a classical optimization problem and it is one of a class of NP-Problem.The purposes of this work is to apply data mining methodologies to explore the patterns in data generated by an Ant Colony Algorithm(ACA) performing a searching operation and to develop a rule set searcher which approximates the ACA′s searcher.An attribute-oriented induction methodology was used to explore the relationship between an operations′ sequence and its attributes and a set of rules has been developed.At the end of this paper,the experimental results have shown that the proposed approach has good performance with respect to the quality of solution and the speed of computation.
文摘The safe production of coalmine can be further improved by forecasting the quantity of gas emission based on the real-time data and historical data which the gas monitoring system has saved. By making use of the advantages of data warehouse and data mining technology for processing large quantity of redundancy data, the method and its application of forecasting mine gas emission quantity based on FDM were studied. The constructing fuzzy resembling relation and clustering analysis were proposed, which the potential relationship inside the gas emission data may be found. The mode finds model and forecast model were presented, and the detailed approach to realize this forecast was also proposed, which have been applied to forecast the gas emission quantity efficiently.
基金supported by the Ensemble Grant for Early Career Researchers 2022-2023 and the 2023 Ensemble Continuation Grant of Tohoku University,the Hirose Foundation,and the AIMR Fusion Research Grantsupported by JSPS KAKENHI Nos.JP23K13599,JP23K13703,JP22H01803,JP18H05513,and JP23K13542.F.Y.and Q.W.acknowledge the China Scholarship Council(CSC)to support their studies in Japan.
文摘Magnesium(Mg)is a promising alternative to lithium(Li)as an anode material in solid-state batteries due to its abundance and high theoretical volumetric capacity.However,the sluggish Mg-ion conduction in the lattice of solidstate electrolytes(SSEs)is one of the key challenges that hamper the development of Mg-ion solid-state batteries.Though various Mg-ion SSEs have been reported in recent years,key insights are hard to be derived from a single literature report.Besides,the structure-performance relationships of Mg-ion SSEs need to be further unraveled to provide a more precise design guideline for SSEs.In this viewpoint article,we analyze the structural characteristics of the Mg-based SSEs with high ionic conductivity reported in the last four decades based upon data mining-we provide big-data-derived insights into the challenges and opportunities in developing next-generation Mg-ion SSEs.
文摘Facing the development of future 5 G, the emerging technologies such as Internet of things, big data, cloud computing, and artificial intelligence is enhancing an explosive growth in data traffic. Radical changes in communication theory and implement technologies, the wireless communications and wireless networks have entered a new era. Among them, wireless big data(WBD) has tremendous value, and artificial intelligence(AI) gives unthinkable possibilities. However, in the big data development and artificial intelligence application groups, the lack of a sound theoretical foundation and mathematical methods is regarded as a real challenge that needs to be solved. From the basic problem of wireless communication, the interrelationship of demand, environment and ability, this paper intends to investigate the concept and data model of WBD, the wireless data mining, the wireless knowledge and wireless knowledge learning(WKL), and typical practices examples, to facilitate and open up more opportunities of WBD research and developments. Such research is beneficial for creating new theoretical foundation and emerging technologies of future wireless communications.
基金supported by Fundamental Research Funds for the Central Universities (No. 2018XD004)
文摘The distance-based outlier detection method detects the implied outliers by calculating the distance of the points in the dataset, but the computational complexity is particularly high when processing multidimensional datasets. In addition, the traditional outlier detection method does not consider the frequency of subsets occurrence, thus, the detected outliers do not fit the definition of outliers (i.e., rarely appearing). The pattern mining-based outlier detection approaches have solved this problem, but the importance of each pattern is not taken into account in outlier detection process, so the detected outliers cannot truly reflect some actual situation. Aimed at these problems, a two-phase minimal weighted rare pattern mining-based outlier detection approach, called MWRPM-Outlier, is proposed to effectively detect outliers on the weight data stream. In particular, a method called MWRPM is proposed in the pattern mining phase to fast mine the minimal weighted rare patterns, and then two deviation factors are defined in outlier detection phase to measure the abnormal degree of each transaction on the weight data stream. Experimental results show that the proposed MWRPM-Outlier approach has excellent performance in outlier detection and MWRPM approach outperforms in weighted rare pattern mining.
基金the supports of the National Natural Science Foundation of China (60403027) the projects of science and research plan of Hubei provincial department of education (2003A011)the Natural Science Foundation Of Hubei Province of China (2005ABA243).
文摘With the increasing deployment of wireless sensordevices and networks,security becomes a criticalchallenge for sensor networks.In this paper,a schemeusing data mining is proposed for routing anomalydetection in wireless sensor networks.The schemeuses the Apriori algorithm to extract traffic patternsfrom both routing table and network traffic packetsand subsequently the K-means cluster algorithmadaptively generates a detection model.Through thecombination of these two algorithms,routing attackscan be detected effectively and automatically.Themain advantage of the proposed approach is that it isable to detect new attacks that have not previouslybeen seen.Moreover,the proposed detection schemeis based on no priori knowledge and then can beapplied to a wide range of different sensor networksfor a variety of routing attacks.