Objective To explore the medication rules of traditional Chinese medicine(TCM)and mechanism of action of hub herb pairs for treating insomnia.Methods Totally 104 prescriptions were statistically analyzed.The associati...Objective To explore the medication rules of traditional Chinese medicine(TCM)and mechanism of action of hub herb pairs for treating insomnia.Methods Totally 104 prescriptions were statistically analyzed.The association rule algorithm was applied to mine the hub herb pairs.Network pharmacology was utilized to analyze the mechanism of the hub herb pairs,while molecular docking was applied to simulate the interaction between receptors and herb molecules,thereby predicting their binding affinities.Results The most frequently used herbs in TCM prescriptions for treating insomnia included Semen Ziziphi Spinosae,Radix Glycyrrhizae,Radix et Rhizoma Ginseng,and Poria cum Radix Pini.Among them,the most commonly used were the supplementing herbs,followed by heat-clearing,mind-calming,and exterior-releasing ones,with their properties of warm and cold,flavors of sweet,Pungent,and bitter,and meridian tropisms of liver,lungs,spleen,kidneys,heart,and stomach.The hub herb pairs based on the association rules included Radix Astragali-Radix et Rhizoma Ginseng,Rhizoma Chuanxiong-Radix Glycyrrhizae,Seman Platycladi-Semen Ziziphi Spinosae,Pericarpium Citri Reticulatae-Radix Glycyrrhizae,Radix Polygalae-Semen Ziziphi Spinosae,and Radix Astragali-Semen Ziziphi Spinosae.Network pharmacology revealed that the cAMP signaling pathway might play a key role in treating insomnia synergistically with HIF-1 signaling pathway,prolactin signaling pathway,chemical carcinogenesis receptor activation,and PI3K-Akt signaling pathway.Molecular docking indicated that there was good binding between the active ingredients of the hub herb pairs and the hub targets.Conclusions This study identified six hub herb pairs for treating insomnia in TCM.These hub herb pairs may synergistically treat insomnia with HIF-1 signaling pathway,prolactin signaling pathway,chemical carcinogenesis receptor activation,and PI3K-Akt signaling pathway through the cAMP signaling pathway.展开更多
The safe production of coalmine can be further improved by forecasting the quantity of gas emission based on the real-time data and historical data which the gas monitoring system has saved. By making use of the advan...The safe production of coalmine can be further improved by forecasting the quantity of gas emission based on the real-time data and historical data which the gas monitoring system has saved. By making use of the advantages of data warehouse and data mining technology for processing large quantity of redundancy data, the method and its application of forecasting mine gas emission quantity based on FDM were studied. The constructing fuzzy resembling relation and clustering analysis were proposed, which the potential relationship inside the gas emission data may be found. The mode finds model and forecast model were presented, and the detailed approach to realize this forecast was also proposed, which have been applied to forecast the gas emission quantity efficiently.展开更多
An intrusion detection (ID) model is proposed based on the fuzzy data mining method. A major difficulty of anomaly ID is that patterns of the normal behavior change with time. In addition, an actual intrusion with a...An intrusion detection (ID) model is proposed based on the fuzzy data mining method. A major difficulty of anomaly ID is that patterns of the normal behavior change with time. In addition, an actual intrusion with a small deviation may match normal patterns. So the intrusion behavior cannot be detected by the detection system.To solve the problem, fuzzy data mining technique is utilized to extract patterns representing the normal behavior of a network. A set of fuzzy association rules mined from the network data are shown as a model of “normal behaviors”. To detect anomalous behaviors, fuzzy association rules are generated from new audit data and the similarity with sets mined from “normal” data is computed. If the similarity values are lower than a threshold value,an alarm is given. Furthermore, genetic algorithms are used to adjust the fuzzy membership functions and to select an appropriate set of features.展开更多
Facing the development of future 5 G, the emerging technologies such as Internet of things, big data, cloud computing, and artificial intelligence is enhancing an explosive growth in data traffic. Radical changes in c...Facing the development of future 5 G, the emerging technologies such as Internet of things, big data, cloud computing, and artificial intelligence is enhancing an explosive growth in data traffic. Radical changes in communication theory and implement technologies, the wireless communications and wireless networks have entered a new era. Among them, wireless big data(WBD) has tremendous value, and artificial intelligence(AI) gives unthinkable possibilities. However, in the big data development and artificial intelligence application groups, the lack of a sound theoretical foundation and mathematical methods is regarded as a real challenge that needs to be solved. From the basic problem of wireless communication, the interrelationship of demand, environment and ability, this paper intends to investigate the concept and data model of WBD, the wireless data mining, the wireless knowledge and wireless knowledge learning(WKL), and typical practices examples, to facilitate and open up more opportunities of WBD research and developments. Such research is beneficial for creating new theoretical foundation and emerging technologies of future wireless communications.展开更多
Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical I...Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical Information Mart for Intensive Care(MIMIC);however,these data are often characterized by a high degree of dimensional heterogeneity,timeliness,scarcity,irregularity,and other characteristics,resulting in the value of these data not being fully utilized.Data-mining technology has been a frontier field in medical research,as it demonstrates excellent performance in evaluating patient risks and assisting clinical decision-making in building disease-prediction models.Therefore,data mining has unique advantages in clinical big-data research,especially in large-scale medical public databases.This article introduced the main medical public database and described the steps,tasks,and models of data mining in simple language.Additionally,we described data-mining methods along with their practical applications.The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.展开更多
In recent years improper allocation of safety input has prevailed in coal mines in China, which resulted in the frequent accidents in coal mining operation. A comprehensive assessment of the input efficiency of coal m...In recent years improper allocation of safety input has prevailed in coal mines in China, which resulted in the frequent accidents in coal mining operation. A comprehensive assessment of the input efficiency of coal mine safety should lead to improved efficiency in the use of funds and management resources. This helps government and enterprise managers better understand how safety inputs are used and to optimize allocation of resources. Study on coal mine's efficiency assessment of safety input was con- ducted in this paper. A C^2R model with non-Archimedean infinitesimal vector based on output is established after consideration of the input characteristics and the model properties. An assessment of an operating mine was done using a specific set of input and output criteria. It is found that the safety input was efficient in 2002 and 2005 and was weakly efficient in 2003. However, the efficiency was relatively low in both 2001 and 2004. The safety input resources can be optimized and adjusted by means of projection theory. Such analysis shows that, on average in 2001 and 2004, 45% of the expended funds could have been saved. Likewise, 10% of the safety management and technical staff could have been eliminated and working hours devoted to safety could have been reduced by 12%. These conditions could have Riven the same results.展开更多
In computational physics proton transfer phenomena could be viewed as pattern classification problems based on a set of input features allowing classification of the proton motion into two categories: transfer 'occu...In computational physics proton transfer phenomena could be viewed as pattern classification problems based on a set of input features allowing classification of the proton motion into two categories: transfer 'occurred' and transfer 'not occurred'. The goal of this paper is to evaluate the use of artificial neural networks in the classification of proton transfer events, based on the feed-forward back propagation neural network, used as a classifier to distinguish between the two transfer cases. In this paper, we use a new developed data mining and pattern recognition tool for automating, controlling, and drawing charts of the output data of an Empirical Valence Bond existing code. The study analyzes the need for pattern recognition in aqueous proton transfer processes and how the learning approach in error back propagation (multilayer perceptron algorithms) could be satisfactorily employed in the present case. We present a tool for pattern recognition and validate the code including a real physical case study. The results of applying the artificial neural networks methodology to crowd patterns based upon selected physical properties (e.g., temperature, density) show the abilities of the network to learn proton transfer patterns corresponding to properties of the aqueous environments, which is in turn proved to be fully compatible with previous proton transfer studies.展开更多
The distance-based outlier detection method detects the implied outliers by calculating the distance of the points in the dataset, but the computational complexity is particularly high when processing multidimensional...The distance-based outlier detection method detects the implied outliers by calculating the distance of the points in the dataset, but the computational complexity is particularly high when processing multidimensional datasets. In addition, the traditional outlier detection method does not consider the frequency of subsets occurrence, thus, the detected outliers do not fit the definition of outliers (i.e., rarely appearing). The pattern mining-based outlier detection approaches have solved this problem, but the importance of each pattern is not taken into account in outlier detection process, so the detected outliers cannot truly reflect some actual situation. Aimed at these problems, a two-phase minimal weighted rare pattern mining-based outlier detection approach, called MWRPM-Outlier, is proposed to effectively detect outliers on the weight data stream. In particular, a method called MWRPM is proposed in the pattern mining phase to fast mine the minimal weighted rare patterns, and then two deviation factors are defined in outlier detection phase to measure the abnormal degree of each transaction on the weight data stream. Experimental results show that the proposed MWRPM-Outlier approach has excellent performance in outlier detection and MWRPM approach outperforms in weighted rare pattern mining.展开更多
For various reasons,many of the security programming rules applicable to specific software have not been recorded in official documents,and hence can hardly be employed by static analysis tools for detection.In this p...For various reasons,many of the security programming rules applicable to specific software have not been recorded in official documents,and hence can hardly be employed by static analysis tools for detection.In this paper,we propose a new approach,named SVR-Miner(Security Validation Rules Miner),which uses frequent sequence mining technique [1-4] to automatically infer implicit security validation rules from large software code written in C programming language.Different from the past works in this area,SVR-Miner introduces three techniques which are sensitive thread,program slicing [5-7],and equivalent statements computing to improve the accuracy of rules.Experiments with the Linux Kernel demonstrate the effectiveness of our approach.With the ten given sensitive threads,SVR-Miner automatically generated 17 security validation rules and detected 8 violations,5 of which were published by Linux Kernel Organization before we detected them.We have reported the other three to the Linux Kernel Organization recently.展开更多
Traditional distribution network planning relies on the professional knowledge of planners,especially when analyzing the correlations between the problems existing in the network and the crucial influencing factors.Th...Traditional distribution network planning relies on the professional knowledge of planners,especially when analyzing the correlations between the problems existing in the network and the crucial influencing factors.The inherent laws reflected by the historical data of the distribution network are ignored,which affects the objectivity of the planning scheme.In this study,to improve the efficiency and accuracy of distribution network planning,the characteristics of distribution network data were extracted using a data-mining technique,and correlation knowledge of existing problems in the network was obtained.A data-mining model based on correlation rules was established.The inputs of the model were the electrical characteristic indices screened using the gray correlation method.The Apriori algorithm was used to extract correlation knowledge from the operational data of the distribution network and obtain strong correlation rules.Degree of promotion and chi-square tests were used to verify the rationality of the strong correlation rules of the model output.In this study,the correlation relationship between heavy load or overload problems of distribution network feeders in different regions and related characteristic indices was determined,and the confidence of the correlation rules was obtained.These results can provide an effective basis for the formulation of a distribution network planning scheme.展开更多
This paper presents a generalized method for updating approximations of a concept incrementally, which can be used as an effective tool to deal with dynamic attribute generalization. By combining this method and the L...This paper presents a generalized method for updating approximations of a concept incrementally, which can be used as an effective tool to deal with dynamic attribute generalization. By combining this method and the LERS inductive learning algorithm, it also introduces a generalized quasi incremental algorithm for learning classification rules from data bases.展开更多
Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data...Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data mining are to discover knowledge of interest to user needs.Data mining is really a useful tool in many domains such as marketing, decision making, etc. However, some basic issues of data mining are ignored. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? Is there any rule we should obey in a data mining process? In order to discover patterns and knowledge really interesting and actionable to the real world Zhang et al proposed a domain-driven human-machine-cooperated data mining process.Zhao and Yao proposed an interactive user-driven classification method using the granule network. In our work, we find that data mining is a kind of knowledge transforming process to transform knowledge from data format into symbol format. Thus, no new knowledge could be generated (born) in a data mining process. In a data mining process, knowledge is just transformed from data format, which is not understandable for human, into symbol format,which is understandable for human and easy to be used.It is similar to the process of translating a book from Chinese into English.In this translating process,the knowledge itself in the book should remain unchanged. What will be changed is the format of the knowledge only. That is, the knowledge in the English book should be kept the same as the knowledge in the Chinese one.Otherwise, there must be some mistakes in the translating proces, that is, we are transforming knowledge from one format into another format while not producing new knowledge in a data mining process. The knowledge is originally stored in data (data is a representation format of knowledge). Unfortunately, we can not read, understand, or use it, since we can not understand data. With this understanding of data mining, we proposed a data-driven knowledge acquisition method based on rough sets. It also improved the performance of classical knowledge acquisition methods. In fact, we also find that the domain-driven data mining and user-driven data mining do not conflict with our data-driven data mining. They could be integrated into domain-oriented data-driven data mining. It is just like the views of data base. Users with different views could look at different partial data of a data base. Thus, users with different tasks or objectives wish, or could discover different knowledge (partial knowledge) from the same data base. However, all these partial knowledge should be originally existed in the data base. So, a domain-oriented data-driven data mining method would help us to extract the knowledge which is really existed in a data base, and really interesting and actionable to the real world.展开更多
To solve information asymmetry problem on online auction, this study suggests and validates a forecasting model of winning bid prices. Especially, it explores the usability of data mining approaches, such as neural ne...To solve information asymmetry problem on online auction, this study suggests and validates a forecasting model of winning bid prices. Especially, it explores the usability of data mining approaches, such as neural network and Bayesian network in building a forecasting model. This research empirically shows that, in forecasting winning bid prices on online auction, data mining techniques have shown better performance than traditional statistical analysis, such as logistic regression and multivariate regression.展开更多
With the increasing deployment of wireless sensordevices and networks,security becomes a criticalchallenge for sensor networks.In this paper,a schemeusing data mining is proposed for routing anomalydetection in wirele...With the increasing deployment of wireless sensordevices and networks,security becomes a criticalchallenge for sensor networks.In this paper,a schemeusing data mining is proposed for routing anomalydetection in wireless sensor networks.The schemeuses the Apriori algorithm to extract traffic patternsfrom both routing table and network traffic packetsand subsequently the K-means cluster algorithmadaptively generates a detection model.Through thecombination of these two algorithms,routing attackscan be detected effectively and automatically.Themain advantage of the proposed approach is that it isable to detect new attacks that have not previouslybeen seen.Moreover,the proposed detection schemeis based on no priori knowledge and then can beapplied to a wide range of different sensor networksfor a variety of routing attacks.展开更多
The existing data mining methods are mostly focused on relational databases and structured data, but not on complex structured data (like in extensible markup language(XML)). By converting XML document type descriptio...The existing data mining methods are mostly focused on relational databases and structured data, but not on complex structured data (like in extensible markup language(XML)). By converting XML document type description to the relational semantic recording XML data relations, and using an XML data mining language, the XML data mining system presents a strategy to mine information on XML.展开更多
An agent-based data mining framework for the high-dimensional environment is built instead of the style of classical structural programming or the object-oriented programming. The framework supports the whole process ...An agent-based data mining framework for the high-dimensional environment is built instead of the style of classical structural programming or the object-oriented programming. The framework supports the whole process of data mining of the high-dimensional environment. Belief-desire-joint intention agents are designed to fit the characteristic of the high-dimensional environment. At the same time, the syntax, semantics and reasoning rules of the agents are given. In the data mining system of the high-dimensional environment, agents need exchange messages. The cooperation behavior mechanism is adopted to complete the communication through the three-level pattern among agents that have their own fixed roles.展开更多
In recent years, the telecommunications have used the concept of NPS(Net Promoter Score) for customer relationship management, but there is neither definite theory research nor instructive instance research. However, ...In recent years, the telecommunications have used the concept of NPS(Net Promoter Score) for customer relationship management, but there is neither definite theory research nor instructive instance research. However, this paper summarizes an approach with instance case analysis to improve customer loyalty via NPS data mining, which has extensive and practical significance for tele-companies. First, this paper finds some driven forces of customer loyalty, which are relative to customer consumption such as the call duration, the usage of data, ARPU, etc., by using some innovative reasoning-analysis based on IG(Information Gain) and xg-boost decision-making tree model, so the tele-companies can predict the role of individual customer and form daily monitoring on big data, which will save a lot of NPS survey cost. Second, this paper summarizes how customer group feature impacts the relationship between NPS and financial performance. Taking ARPU value as the performance goals, we divide the sample customers into 6 groups and summarize their characteristics based on k-means clustering, and give targeted suggestion of each group.展开更多
The outbreak of coronavirus disease 2019(COVID-2019)has drawn public attention all over the world.As a newly emerging area,single cell sequencing also exerts its power in the battle over the epidemic.In this review,th...The outbreak of coronavirus disease 2019(COVID-2019)has drawn public attention all over the world.As a newly emerging area,single cell sequencing also exerts its power in the battle over the epidemic.In this review,the up-to-date knowledge of COVID-19 and its receptor is summarized,followed by a collection of the mining of single cell transcriptome profiling data for the information in aspects of the vulnerable cell types in humans and the potential mechanisms of the disease.展开更多
基金National Natural Science Foundation of China(82360905)Gansu Provincial University Teachers'Innovation Fund Projects(2023A-092 and 2024B-109).
文摘Objective To explore the medication rules of traditional Chinese medicine(TCM)and mechanism of action of hub herb pairs for treating insomnia.Methods Totally 104 prescriptions were statistically analyzed.The association rule algorithm was applied to mine the hub herb pairs.Network pharmacology was utilized to analyze the mechanism of the hub herb pairs,while molecular docking was applied to simulate the interaction between receptors and herb molecules,thereby predicting their binding affinities.Results The most frequently used herbs in TCM prescriptions for treating insomnia included Semen Ziziphi Spinosae,Radix Glycyrrhizae,Radix et Rhizoma Ginseng,and Poria cum Radix Pini.Among them,the most commonly used were the supplementing herbs,followed by heat-clearing,mind-calming,and exterior-releasing ones,with their properties of warm and cold,flavors of sweet,Pungent,and bitter,and meridian tropisms of liver,lungs,spleen,kidneys,heart,and stomach.The hub herb pairs based on the association rules included Radix Astragali-Radix et Rhizoma Ginseng,Rhizoma Chuanxiong-Radix Glycyrrhizae,Seman Platycladi-Semen Ziziphi Spinosae,Pericarpium Citri Reticulatae-Radix Glycyrrhizae,Radix Polygalae-Semen Ziziphi Spinosae,and Radix Astragali-Semen Ziziphi Spinosae.Network pharmacology revealed that the cAMP signaling pathway might play a key role in treating insomnia synergistically with HIF-1 signaling pathway,prolactin signaling pathway,chemical carcinogenesis receptor activation,and PI3K-Akt signaling pathway.Molecular docking indicated that there was good binding between the active ingredients of the hub herb pairs and the hub targets.Conclusions This study identified six hub herb pairs for treating insomnia in TCM.These hub herb pairs may synergistically treat insomnia with HIF-1 signaling pathway,prolactin signaling pathway,chemical carcinogenesis receptor activation,and PI3K-Akt signaling pathway through the cAMP signaling pathway.
文摘The safe production of coalmine can be further improved by forecasting the quantity of gas emission based on the real-time data and historical data which the gas monitoring system has saved. By making use of the advantages of data warehouse and data mining technology for processing large quantity of redundancy data, the method and its application of forecasting mine gas emission quantity based on FDM were studied. The constructing fuzzy resembling relation and clustering analysis were proposed, which the potential relationship inside the gas emission data may be found. The mode finds model and forecast model were presented, and the detailed approach to realize this forecast was also proposed, which have been applied to forecast the gas emission quantity efficiently.
文摘An intrusion detection (ID) model is proposed based on the fuzzy data mining method. A major difficulty of anomaly ID is that patterns of the normal behavior change with time. In addition, an actual intrusion with a small deviation may match normal patterns. So the intrusion behavior cannot be detected by the detection system.To solve the problem, fuzzy data mining technique is utilized to extract patterns representing the normal behavior of a network. A set of fuzzy association rules mined from the network data are shown as a model of “normal behaviors”. To detect anomalous behaviors, fuzzy association rules are generated from new audit data and the similarity with sets mined from “normal” data is computed. If the similarity values are lower than a threshold value,an alarm is given. Furthermore, genetic algorithms are used to adjust the fuzzy membership functions and to select an appropriate set of features.
文摘Facing the development of future 5 G, the emerging technologies such as Internet of things, big data, cloud computing, and artificial intelligence is enhancing an explosive growth in data traffic. Radical changes in communication theory and implement technologies, the wireless communications and wireless networks have entered a new era. Among them, wireless big data(WBD) has tremendous value, and artificial intelligence(AI) gives unthinkable possibilities. However, in the big data development and artificial intelligence application groups, the lack of a sound theoretical foundation and mathematical methods is regarded as a real challenge that needs to be solved. From the basic problem of wireless communication, the interrelationship of demand, environment and ability, this paper intends to investigate the concept and data model of WBD, the wireless data mining, the wireless knowledge and wireless knowledge learning(WKL), and typical practices examples, to facilitate and open up more opportunities of WBD research and developments. Such research is beneficial for creating new theoretical foundation and emerging technologies of future wireless communications.
基金the National Social Science Foundation of China(No.16BGL183).
文摘Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical Information Mart for Intensive Care(MIMIC);however,these data are often characterized by a high degree of dimensional heterogeneity,timeliness,scarcity,irregularity,and other characteristics,resulting in the value of these data not being fully utilized.Data-mining technology has been a frontier field in medical research,as it demonstrates excellent performance in evaluating patient risks and assisting clinical decision-making in building disease-prediction models.Therefore,data mining has unique advantages in clinical big-data research,especially in large-scale medical public databases.This article introduced the main medical public database and described the steps,tasks,and models of data mining in simple language.Additionally,we described data-mining methods along with their practical applications.The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.
基金Project 70771105 supported by the National Natural Science Foundation of China
文摘In recent years improper allocation of safety input has prevailed in coal mines in China, which resulted in the frequent accidents in coal mining operation. A comprehensive assessment of the input efficiency of coal mine safety should lead to improved efficiency in the use of funds and management resources. This helps government and enterprise managers better understand how safety inputs are used and to optimize allocation of resources. Study on coal mine's efficiency assessment of safety input was con- ducted in this paper. A C^2R model with non-Archimedean infinitesimal vector based on output is established after consideration of the input characteristics and the model properties. An assessment of an operating mine was done using a specific set of input and output criteria. It is found that the safety input was efficient in 2002 and 2005 and was weakly efficient in 2003. However, the efficiency was relatively low in both 2001 and 2004. The safety input resources can be optimized and adjusted by means of projection theory. Such analysis shows that, on average in 2001 and 2004, 45% of the expended funds could have been saved. Likewise, 10% of the safety management and technical staff could have been eliminated and working hours devoted to safety could have been reduced by 12%. These conditions could have Riven the same results.
基金Dr. Steve Jones, Scientific Advisor of the Canon Foundation for Scientific Research (7200 The Quorum, Oxford Business Park, Oxford OX4 2JZ, England). Canon Foundation for Scientific Research funded the UPC 2013 tuition fees of the corresponding author during her writing this article
文摘In computational physics proton transfer phenomena could be viewed as pattern classification problems based on a set of input features allowing classification of the proton motion into two categories: transfer 'occurred' and transfer 'not occurred'. The goal of this paper is to evaluate the use of artificial neural networks in the classification of proton transfer events, based on the feed-forward back propagation neural network, used as a classifier to distinguish between the two transfer cases. In this paper, we use a new developed data mining and pattern recognition tool for automating, controlling, and drawing charts of the output data of an Empirical Valence Bond existing code. The study analyzes the need for pattern recognition in aqueous proton transfer processes and how the learning approach in error back propagation (multilayer perceptron algorithms) could be satisfactorily employed in the present case. We present a tool for pattern recognition and validate the code including a real physical case study. The results of applying the artificial neural networks methodology to crowd patterns based upon selected physical properties (e.g., temperature, density) show the abilities of the network to learn proton transfer patterns corresponding to properties of the aqueous environments, which is in turn proved to be fully compatible with previous proton transfer studies.
基金supported by Fundamental Research Funds for the Central Universities (No. 2018XD004)
文摘The distance-based outlier detection method detects the implied outliers by calculating the distance of the points in the dataset, but the computational complexity is particularly high when processing multidimensional datasets. In addition, the traditional outlier detection method does not consider the frequency of subsets occurrence, thus, the detected outliers do not fit the definition of outliers (i.e., rarely appearing). The pattern mining-based outlier detection approaches have solved this problem, but the importance of each pattern is not taken into account in outlier detection process, so the detected outliers cannot truly reflect some actual situation. Aimed at these problems, a two-phase minimal weighted rare pattern mining-based outlier detection approach, called MWRPM-Outlier, is proposed to effectively detect outliers on the weight data stream. In particular, a method called MWRPM is proposed in the pattern mining phase to fast mine the minimal weighted rare patterns, and then two deviation factors are defined in outlier detection phase to measure the abnormal degree of each transaction on the weight data stream. Experimental results show that the proposed MWRPM-Outlier approach has excellent performance in outlier detection and MWRPM approach outperforms in weighted rare pattern mining.
基金National Natural Science Foundation of China under Grant No.60873213,91018008 and 61070192Beijing Science Foundation under Grant No. 4082018Shanghai Key Laboratory of Intelligent Information Processing of China under Grant No. IIPL-09-006
文摘For various reasons,many of the security programming rules applicable to specific software have not been recorded in official documents,and hence can hardly be employed by static analysis tools for detection.In this paper,we propose a new approach,named SVR-Miner(Security Validation Rules Miner),which uses frequent sequence mining technique [1-4] to automatically infer implicit security validation rules from large software code written in C programming language.Different from the past works in this area,SVR-Miner introduces three techniques which are sensitive thread,program slicing [5-7],and equivalent statements computing to improve the accuracy of rules.Experiments with the Linux Kernel demonstrate the effectiveness of our approach.With the ten given sensitive threads,SVR-Miner automatically generated 17 security validation rules and detected 8 violations,5 of which were published by Linux Kernel Organization before we detected them.We have reported the other three to the Linux Kernel Organization recently.
基金supported by the Science and Technology Project of China Southern Power Grid(GZHKJXM20210043-080041KK52210002).
文摘Traditional distribution network planning relies on the professional knowledge of planners,especially when analyzing the correlations between the problems existing in the network and the crucial influencing factors.The inherent laws reflected by the historical data of the distribution network are ignored,which affects the objectivity of the planning scheme.In this study,to improve the efficiency and accuracy of distribution network planning,the characteristics of distribution network data were extracted using a data-mining technique,and correlation knowledge of existing problems in the network was obtained.A data-mining model based on correlation rules was established.The inputs of the model were the electrical characteristic indices screened using the gray correlation method.The Apriori algorithm was used to extract correlation knowledge from the operational data of the distribution network and obtain strong correlation rules.Degree of promotion and chi-square tests were used to verify the rationality of the strong correlation rules of the model output.In this study,the correlation relationship between heavy load or overload problems of distribution network feeders in different regions and related characteristic indices was determined,and the confidence of the correlation rules was obtained.These results can provide an effective basis for the formulation of a distribution network planning scheme.
文摘This paper presents a generalized method for updating approximations of a concept incrementally, which can be used as an effective tool to deal with dynamic attribute generalization. By combining this method and the LERS inductive learning algorithm, it also introduces a generalized quasi incremental algorithm for learning classification rules from data bases.
文摘Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data mining are to discover knowledge of interest to user needs.Data mining is really a useful tool in many domains such as marketing, decision making, etc. However, some basic issues of data mining are ignored. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? Is there any rule we should obey in a data mining process? In order to discover patterns and knowledge really interesting and actionable to the real world Zhang et al proposed a domain-driven human-machine-cooperated data mining process.Zhao and Yao proposed an interactive user-driven classification method using the granule network. In our work, we find that data mining is a kind of knowledge transforming process to transform knowledge from data format into symbol format. Thus, no new knowledge could be generated (born) in a data mining process. In a data mining process, knowledge is just transformed from data format, which is not understandable for human, into symbol format,which is understandable for human and easy to be used.It is similar to the process of translating a book from Chinese into English.In this translating process,the knowledge itself in the book should remain unchanged. What will be changed is the format of the knowledge only. That is, the knowledge in the English book should be kept the same as the knowledge in the Chinese one.Otherwise, there must be some mistakes in the translating proces, that is, we are transforming knowledge from one format into another format while not producing new knowledge in a data mining process. The knowledge is originally stored in data (data is a representation format of knowledge). Unfortunately, we can not read, understand, or use it, since we can not understand data. With this understanding of data mining, we proposed a data-driven knowledge acquisition method based on rough sets. It also improved the performance of classical knowledge acquisition methods. In fact, we also find that the domain-driven data mining and user-driven data mining do not conflict with our data-driven data mining. They could be integrated into domain-oriented data-driven data mining. It is just like the views of data base. Users with different views could look at different partial data of a data base. Thus, users with different tasks or objectives wish, or could discover different knowledge (partial knowledge) from the same data base. However, all these partial knowledge should be originally existed in the data base. So, a domain-oriented data-driven data mining method would help us to extract the knowledge which is really existed in a data base, and really interesting and actionable to the real world.
文摘To solve information asymmetry problem on online auction, this study suggests and validates a forecasting model of winning bid prices. Especially, it explores the usability of data mining approaches, such as neural network and Bayesian network in building a forecasting model. This research empirically shows that, in forecasting winning bid prices on online auction, data mining techniques have shown better performance than traditional statistical analysis, such as logistic regression and multivariate regression.
基金the supports of the National Natural Science Foundation of China (60403027) the projects of science and research plan of Hubei provincial department of education (2003A011)the Natural Science Foundation Of Hubei Province of China (2005ABA243).
文摘With the increasing deployment of wireless sensordevices and networks,security becomes a criticalchallenge for sensor networks.In this paper,a schemeusing data mining is proposed for routing anomalydetection in wireless sensor networks.The schemeuses the Apriori algorithm to extract traffic patternsfrom both routing table and network traffic packetsand subsequently the K-means cluster algorithmadaptively generates a detection model.Through thecombination of these two algorithms,routing attackscan be detected effectively and automatically.Themain advantage of the proposed approach is that it isable to detect new attacks that have not previouslybeen seen.Moreover,the proposed detection schemeis based on no priori knowledge and then can beapplied to a wide range of different sensor networksfor a variety of routing attacks.
文摘The existing data mining methods are mostly focused on relational databases and structured data, but not on complex structured data (like in extensible markup language(XML)). By converting XML document type description to the relational semantic recording XML data relations, and using an XML data mining language, the XML data mining system presents a strategy to mine information on XML.
文摘An agent-based data mining framework for the high-dimensional environment is built instead of the style of classical structural programming or the object-oriented programming. The framework supports the whole process of data mining of the high-dimensional environment. Belief-desire-joint intention agents are designed to fit the characteristic of the high-dimensional environment. At the same time, the syntax, semantics and reasoning rules of the agents are given. In the data mining system of the high-dimensional environment, agents need exchange messages. The cooperation behavior mechanism is adopted to complete the communication through the three-level pattern among agents that have their own fixed roles.
基金Supported by Humanities and Social Sciences Foundation of Ministry of Education in China (Project No. 16YJA630063)
文摘In recent years, the telecommunications have used the concept of NPS(Net Promoter Score) for customer relationship management, but there is neither definite theory research nor instructive instance research. However, this paper summarizes an approach with instance case analysis to improve customer loyalty via NPS data mining, which has extensive and practical significance for tele-companies. First, this paper finds some driven forces of customer loyalty, which are relative to customer consumption such as the call duration, the usage of data, ARPU, etc., by using some innovative reasoning-analysis based on IG(Information Gain) and xg-boost decision-making tree model, so the tele-companies can predict the role of individual customer and form daily monitoring on big data, which will save a lot of NPS survey cost. Second, this paper summarizes how customer group feature impacts the relationship between NPS and financial performance. Taking ARPU value as the performance goals, we divide the sample customers into 6 groups and summarize their characteristics based on k-means clustering, and give targeted suggestion of each group.
基金the National Key R&D Program of China under Grant No.2018YFC0910405the National Natural Science Foundation of China under Grants No.61922020,No.61771331,and No.91935302.
文摘The outbreak of coronavirus disease 2019(COVID-2019)has drawn public attention all over the world.As a newly emerging area,single cell sequencing also exerts its power in the battle over the epidemic.In this review,the up-to-date knowledge of COVID-19 and its receptor is summarized,followed by a collection of the mining of single cell transcriptome profiling data for the information in aspects of the vulnerable cell types in humans and the potential mechanisms of the disease.