With its generality and practicality, the combination of partial charging curves and machine learning(ML) for battery capacity estimation has attracted widespread attention. However, a clear classification,fair compar...With its generality and practicality, the combination of partial charging curves and machine learning(ML) for battery capacity estimation has attracted widespread attention. However, a clear classification,fair comparison, and performance rationalization of these methods are lacking, due to the scattered existing studies. To address these issues, we develop 20 capacity estimation methods from three perspectives:charging sequence construction, input forms, and ML models. 22,582 charging curves are generated from 44 cells with different battery chemistry and operating conditions to validate the performance. Through comprehensive and unbiased comparison, the long short-term memory(LSTM) based neural network exhibits the best accuracy and robustness. Across all 6503 tested samples, the mean absolute percentage error(MAPE) for capacity estimation using LSTM is 0.61%, with a maximum error of only 3.94%. Even with the addition of 3 m V voltage noise or the extension of sampling intervals to 60 s, the average MAPE remains below 2%. Furthermore, the charging sequences are provided with physical explanations related to battery degradation to enhance confidence in their application. Recommendations for using other competitive methods are also presented. This work provides valuable insights and guidance for estimating battery capacity based on partial charging curves.展开更多
To reduce CO_(2) emissions in response to global climate change,shale reservoirs could be ideal candidates for long-term carbon geo-sequestration involving multi-scale transport processes.However,most current CO_(2) s...To reduce CO_(2) emissions in response to global climate change,shale reservoirs could be ideal candidates for long-term carbon geo-sequestration involving multi-scale transport processes.However,most current CO_(2) sequestration models do not adequately consider multiple transport mechanisms.Moreover,the evaluation of CO_(2) storage processes usually involves laborious and time-consuming numerical simulations unsuitable for practical prediction and decision-making.In this paper,an integrated model involving gas diffusion,adsorption,dissolution,slip flow,and Darcy flow is proposed to accurately characterize CO_(2) storage in depleted shale reservoirs,supporting the establishment of a training database.On this basis,a hybrid physics-informed data-driven neural network(HPDNN)is developed as a deep learning surrogate for prediction and inversion.By incorporating multiple sources of scientific knowledge,the HPDNN can be configured with limited simulation resources,significantly accelerating the forward and inversion processes.Furthermore,the HPDNN can more intelligently predict injection performance,precisely perform reservoir parameter inversion,and reasonably evaluate the CO_(2) storage capacity under complicated scenarios.The validation and test results demonstrate that the HPDNN can ensure high accuracy and strong robustness across an extensive applicability range when dealing with field data with multiple noise sources.This study has tremendous potential to replace traditional modeling tools for predicting and making decisions about CO_(2) storage projects in depleted shale reservoirs.展开更多
The shale gas development process is complex in terms of its flow mechanisms and the accuracy of the production forecasting is influenced by geological parameters and engineering parameters.Therefore,to quantitatively...The shale gas development process is complex in terms of its flow mechanisms and the accuracy of the production forecasting is influenced by geological parameters and engineering parameters.Therefore,to quantitatively evaluate the relative importance of model parameters on the production forecasting performance,sensitivity analysis of parameters is required.The parameters are ranked according to the sensitivity coefficients for the subsequent optimization scheme design.A data-driven global sensitivity analysis(GSA)method using convolutional neural networks(CNN)is proposed to identify the influencing parameters in shale gas production.The CNN is trained on a large dataset,validated against numerical simulations,and utilized as a surrogate model for efficient sensitivity analysis.Our approach integrates CNN with the Sobol'global sensitivity analysis method,presenting three key scenarios for sensitivity analysis:analysis of the production stage as a whole,analysis by fixed time intervals,and analysis by declining rate.The findings underscore the predominant influence of reservoir thickness and well length on shale gas production.Furthermore,the temporal sensitivity analysis reveals the dynamic shifts in parameter importance across the distinct production stages.展开更多
This study employs a data-driven methodology that embeds the principle of dimensional invariance into an artificial neural network to automatically identify dominant dimensionless quantities in the penetration of rod ...This study employs a data-driven methodology that embeds the principle of dimensional invariance into an artificial neural network to automatically identify dominant dimensionless quantities in the penetration of rod projectiles into semi-infinite metal targets from experimental measurements.The derived mathematical expressions of dimensionless quantities are simplified by the examination of the exponent matrix and coupling relationships between feature variables.As a physics-based dimension reduction methodology,this way reduces high-dimensional parameter spaces to descriptions involving only a few physically interpretable dimensionless quantities in penetrating cases.Then the relative importance of various dimensionless feature variables on the penetration efficiencies for four impacting conditions is evaluated through feature selection engineering.The results indicate that the selected critical dimensionless feature variables by this synergistic method,without referring to the complex theoretical equations and aiding in the detailed knowledge of penetration mechanics,are in accordance with those reported in the reference.Lastly,the determined dimensionless quantities can be efficiently applied to conduct semi-empirical analysis for the specific penetrating case,and the reliability of regression functions is validated.展开更多
Lithium-ion batteries are the preferred green energy storage method and are equipped with intelligent battery management systems(BMSs)that efficiently manage the batteries.This not only ensures the safety performance ...Lithium-ion batteries are the preferred green energy storage method and are equipped with intelligent battery management systems(BMSs)that efficiently manage the batteries.This not only ensures the safety performance of the batteries but also significantly improves their efficiency and reduces their damage rate.Throughout their whole life cycle,lithium-ion batteries undergo aging and performance degradation due to diverse external environments and irregular degradation of internal materials.This degradation is reflected in the state of health(SOH)assessment.Therefore,this review offers the first comprehensive analysis of battery SOH estimation strategies across the entire lifecycle over the past five years,highlighting common research focuses rooted in data-driven methods.It delves into various dimensions such as dataset integration and preprocessing,health feature parameter extraction,and the construction of SOH estimation models.These approaches unearth hidden insights within data,addressing the inherent tension between computational complexity and estimation accuracy.To enha nce support for in-vehicle implementation,cloud computing,and the echelon technologies of battery recycling,remanufacturing,and reuse,as well as to offer insights into these technologies,a segmented management approach will be introduced in the future.This will encompass source domain data processing,multi-feature factor reconfiguration,hybrid drive modeling,parameter correction mechanisms,and fulltime health management.Based on the best SOH estimation outcomes,health strategies tailored to different stages can be devised in the future,leading to the establishment of a comprehensive SOH assessment framework.This will mitigate cross-domain distribution disparities and facilitate adaptation to a broader array of dynamic operation protocols.This article reviews the current research landscape from four perspectives and discusses the challenges that lie ahead.Researchers and practitioners can gain a comprehensive understanding of battery SOH estimation methods,offering valuable insights for the development of advanced battery management systems and embedded application research.展开更多
Utilizing machine learning techniques for data-driven diagnosis of high temperature PEM fuel cells is beneficial and meaningful to the system durability. Nevertheless, ensuring the robustness of diagnosis remains a cr...Utilizing machine learning techniques for data-driven diagnosis of high temperature PEM fuel cells is beneficial and meaningful to the system durability. Nevertheless, ensuring the robustness of diagnosis remains a critical and challenging task in real application. To enhance the robustness of diagnosis and achieve a more thorough evaluation of diagnostic performance, a robust diagnostic procedure based on electrochemical impedance spectroscopy (EIS) and a new method for evaluation of the diagnosis robustness was proposed and investigated in this work. To improve the diagnosis robustness: (1) the degradation mechanism of different faults in the high temperature PEM fuel cell was first analyzed via the distribution of relaxation time of EIS to determine the equivalent circuit model (ECM) with better interpretability, simplicity and accuracy;(2) the feature extraction was implemented on the identified parameters of the ECM and extra attention was paid to distinguishing between the long-term normal degradation and other faults;(3) a Siamese Network was adopted to get features with higher robustness in a new embedding. The diagnosis was conducted using 6 classic classification algorithms—support vector machine (SVM), K-nearest neighbor (KNN), logistic regression (LR), decision tree (DT), random forest (RF), and Naive Bayes employing a dataset comprising a total of 1935 collected EIS. To evaluate the robustness of trained models: (1) different levels of errors were added to the features for performance evaluation;(2) a robustness coefficient (Roubust_C) was defined for a quantified and explicit evaluation of the diagnosis robustness. The diagnostic models employing the proposed feature extraction method can not only achieve the higher performance of around 100% but also higher robustness for diagnosis models. Despite the initial performance being similar, the KNN demonstrated a superior robustness after feature selection and re-embedding by triplet-loss method, which suggests the necessity of robustness evaluation for the machine learning models and the effectiveness of the defined robustness coefficient. This work hopes to give new insights to the robust diagnosis of high temperature PEM fuel cells and more comprehensive performance evaluation of the data-driven method for diagnostic application.展开更多
During the past few decades,mobile wireless communications have experienced four generations of technological revolution,namely from 1 G to 4 G,and the deployment of the latest 5 G networks is expected to take place i...During the past few decades,mobile wireless communications have experienced four generations of technological revolution,namely from 1 G to 4 G,and the deployment of the latest 5 G networks is expected to take place in 2019.One fundamental question is how we can push forward the development of mobile wireless communications while it has become an extremely complex and sophisticated system.We believe that the answer lies in the huge volumes of data produced by the network itself,and machine learning may become a key to exploit such information.In this paper,we elaborate why the conventional model-based paradigm,which has been widely proved useful in pre-5 G networks,can be less efficient or even less practical in the future 5 G and beyond mobile networks.Then,we explain how the data-driven paradigm,using state-of-the-art machine learning techniques,can become a promising solution.At last,we provide a typical use case of the data-driven paradigm,i.e.,proactive load balancing,in which online learning is utilized to adjust cell configurations in advance to avoid burst congestion caused by rapid traffic changes.展开更多
Increasing the production and utilization of shale gas is of great significance for building a clean and low-carbon energy system.Sharp decline of gas production has been widely observed in shale gas reservoirs.How to...Increasing the production and utilization of shale gas is of great significance for building a clean and low-carbon energy system.Sharp decline of gas production has been widely observed in shale gas reservoirs.How to forecast shale gas production is still challenging due to complex fracture networks,dynamic fracture properties,frac hits,complicated multiphase flow,and multi-scale flow as well as data quality and uncertainty.This work develops an integrated framework for evaluating shale gas well production based on data-driven models.Firstly,a comprehensive dominated-factor system has been established,including geological,drilling,fracturing,and production factors.Data processing and visualization are required to ensure data quality and determine final data set.A shale gas production evaluation model is developed to evaluate shale gas production levels.Finally,the random forest algorithm is used to forecast shale gas production.The prediction accuracy of shale gas production level is higher than 95%based on the shale gas reservoirs in China.Forty-one wells are randomly selected to predict cumulative gas production using the optimal regression model.The proposed shale gas production evaluation frame-work overcomes too many assumptions of analytical or semi-analytical models and avoids huge computation cost and poor generalization for numerical modelling.展开更多
Recently,orthogonal time frequency space(OTFS)was presented to alleviate severe Doppler effects in high mobility scenarios.Most of the current OTFS detection schemes rely on perfect channel state information(CSI).Howe...Recently,orthogonal time frequency space(OTFS)was presented to alleviate severe Doppler effects in high mobility scenarios.Most of the current OTFS detection schemes rely on perfect channel state information(CSI).However,in real-life systems,the parameters of channels will constantly change,which are often difficult to capture and describe.In this paper,we summarize the existing research on OTFS detection based on data-driven deep learning(DL)and propose three new network structures.The presented three networks include a residual network(ResNet),a dense network(DenseNet),and a residual dense network(RDN)for OTFS detection.The detection schemes based on data-driven paradigms do not require a model that is easy to handle mathematically.Meanwhile,compared with the existing fully connected-deep neural network(FC-DNN)and standard convolutional neural network(CNN),these three new networks can alleviate the problems of gradient explosion and gradient disappearance.Through simulation,it is proved that RDN has the best performance among the three proposed schemes due to the combination of shallow and deep features.RDN can solve the issue of performance loss caused by the traditional network not fully utilizing all the hierarchical information.展开更多
A corrosion defect is recognized as one of the most severe phenomena for high-pressure pipelines,especially those served for a long time.Finite-element method and empirical formulas are thereby used for the strength p...A corrosion defect is recognized as one of the most severe phenomena for high-pressure pipelines,especially those served for a long time.Finite-element method and empirical formulas are thereby used for the strength prediction of such pipes with corrosion.However,it is time-consuming for finite-element method and there is a limited application range by using empirical formulas.In order to improve the prediction of strength,this paper investigates the burst pressure of line pipelines with a single corrosion defect subjected to internal pressure based on data-driven methods.Three supervised ML(machine learning)algorithms,including the ANN(artificial neural network),the SVM(support vector machine)and the LR(linear regression),are deployed to train models based on experimental data.Data analysis is first conducted to determine proper pipe features for training.Hyperparameter tuning to control the learning process is then performed to fit the best strength models for corroded pipelines.Among all the proposed data-driven models,the ANN model with three neural layers has the highest training accuracy,but also presents the largest variance.The SVM model provides both high training accuracy and high validation accuracy.The LR model has the best performance in terms of generalization ability.These models can be served as surrogate models by transfer learning with new coming data in future research,facilitating a sustainable and intelligent decision-making of corroded pipelines.展开更多
A comprehensive and precise analysis of shale gas production performance is crucial for evaluating resource potential,designing a field development plan,and making investment decisions.However,quantitative analysis ca...A comprehensive and precise analysis of shale gas production performance is crucial for evaluating resource potential,designing a field development plan,and making investment decisions.However,quantitative analysis can be challenging because production performance is dominated by the complex interaction among a series of geological and engineering factors.In fact,each factor can be viewed as a player who makes cooperative contributions to the production payoff within the constraints of physical laws and models.Inspired by the idea,we propose a hybrid data-driven analysis framework in this study,where the contributions of dominant factors are quantitatively evaluated,the productions are precisely forecasted,and the development optimization suggestions are comprehensively generated.More specifically,game theory and machine learning models are coupled to determine the dominating geological and engineering factors.The Shapley value with definite physical meaning is employed to quantitatively measure the effects of individual factors.A multi-model-fused stacked model is trained for production forecast,which provides the basis for derivative-free optimization algorithms to optimize the development plan.The complete workflow is validated with actual production data collected from the Fuling shale gas field,Sichuan Basin,China.The validation results show that the proposed procedure can draw rigorous conclusions with quantified evidence and thereby provide specific and reliable suggestions for development plan optimization.Comparing with traditional and experience-based approaches,the hybrid data-driven procedure is advanced in terms of both efficiency and accuracy.展开更多
In the current data-intensive era, the traditional hands-on method of conducting scientific research by exploring related publications to generate a testable hypothesis is well on its way of becoming obsolete within j...In the current data-intensive era, the traditional hands-on method of conducting scientific research by exploring related publications to generate a testable hypothesis is well on its way of becoming obsolete within just a year or two. Analyzing the literature and data to automatically generate a hypothesis might become the de facto approach to inform the core research efforts of those trying to master the exponentially rapid expansion of publications and datasets. Here, viewpoints are provided and discussed to help the understanding of challenges of data-driven discovery.展开更多
A robust low-carbon economic optimal scheduling method that considers source-load uncertainty and hydrogen energy utilization is developed.The proposed method overcomes the challenge of source-load random fluctuations...A robust low-carbon economic optimal scheduling method that considers source-load uncertainty and hydrogen energy utilization is developed.The proposed method overcomes the challenge of source-load random fluctuations in integrated energy systems(IESs)in the operation scheduling problem of integrated energy production units(IEPUs).First,to solve the problem of inaccurate prediction of renewable energy output,an improved robust kernel density estimation method is proposed to construct a data-driven uncertainty output set of renewable energy sources statistically and build a typical scenario of load uncertainty using stochastic scenario reduction.Subsequently,to resolve the problem of insufficient utilization of hydrogen energy in existing IEPUs,a robust low-carbon economic optimal scheduling model of the source-load interaction of an IES with a hydrogen energy system is established.The system considers the further utilization of energy using hydrogen energy coupling equipment(such as hydrogen storage devices and fuel cells)and the comprehensive demand response of load-side schedulable resources.The simulation results show that the proposed robust stochastic optimization model driven by data can effectively reduce carbon dioxide emissions,improve the source-load interaction of the IES,realize the efficient use of hydrogen energy,and improve system robustness.展开更多
With the integration of renewable energy resources,the inertia of power systems significantly reduces,thereby making the system sensitive to operational disturbances.A disturbance-based method is presented herein to e...With the integration of renewable energy resources,the inertia of power systems significantly reduces,thereby making the system sensitive to operational disturbances.A disturbance-based method is presented herein to estimate inertia,uncovering the influence of renewables on system-resilient operations.The Gaussian process regression method is then used to predict the power system trajectory after disturbance.Extensive tests demonstrate the data-driven method mathematically estimates the inertia of the system as well as predicts the dynamics operations of power grids subject to disturbances.Numerical results also offer insights into the enhancement of system resilience by strategically designing the inertia of power systems.展开更多
Based on the traditional numerical simulation and optimization algorithms,in combination with the layered injection and production"hard data"monitored at real time by automatic control technology,a systemati...Based on the traditional numerical simulation and optimization algorithms,in combination with the layered injection and production"hard data"monitored at real time by automatic control technology,a systematic approach for detailed water injection design using data-driven algorithms is proposed.First the data assimilation technology is used to match geological model parameters under the constraint of observed well dynamics;the flow relationships between injectors and producers in the block are calculated based on automatic identification method for layered injection-production flow relationship;multi-layer and multi-direction production splitting technique is used to calculate the liquid and oil production of producers in different layers and directions and obtain quantified indexes of water injection effect.Then,machine learning algorithms are applied to evaluate the effectiveness of water injection in different layers of wells and to perform the water injection direction adjustment.Finally,the particle swarm algorithm is used to optimize the detailed water injection plan and to make production predictions.This method and procedure make full use of the automation and intelligence of data-driven and machine learning algorithms.This method was used to match the data of a complex faulted reservoir in eastern China,achieving a fitting level of 85%.The cumulative oil production in the example block for 12 months after optimization is 8.2%higher than before.This method can help design detailed water injection program for mature oilfields.展开更多
Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data...Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data mining are to discover knowledge of interest to user needs.Data mining is really a useful tool in many domains such as marketing, decision making, etc. However, some basic issues of data mining are ignored. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? Is there any rule we should obey in a data mining process? In order to discover patterns and knowledge really interesting and actionable to the real world Zhang et al proposed a domain-driven human-machine-cooperated data mining process.Zhao and Yao proposed an interactive user-driven classification method using the granule network. In our work, we find that data mining is a kind of knowledge transforming process to transform knowledge from data format into symbol format. Thus, no new knowledge could be generated (born) in a data mining process. In a data mining process, knowledge is just transformed from data format, which is not understandable for human, into symbol format,which is understandable for human and easy to be used.It is similar to the process of translating a book from Chinese into English.In this translating process,the knowledge itself in the book should remain unchanged. What will be changed is the format of the knowledge only. That is, the knowledge in the English book should be kept the same as the knowledge in the Chinese one.Otherwise, there must be some mistakes in the translating proces, that is, we are transforming knowledge from one format into another format while not producing new knowledge in a data mining process. The knowledge is originally stored in data (data is a representation format of knowledge). Unfortunately, we can not read, understand, or use it, since we can not understand data. With this understanding of data mining, we proposed a data-driven knowledge acquisition method based on rough sets. It also improved the performance of classical knowledge acquisition methods. In fact, we also find that the domain-driven data mining and user-driven data mining do not conflict with our data-driven data mining. They could be integrated into domain-oriented data-driven data mining. It is just like the views of data base. Users with different views could look at different partial data of a data base. Thus, users with different tasks or objectives wish, or could discover different knowledge (partial knowledge) from the same data base. However, all these partial knowledge should be originally existed in the data base. So, a domain-oriented data-driven data mining method would help us to extract the knowledge which is really existed in a data base, and really interesting and actionable to the real world.展开更多
The dynamical modeling of projectile systems with sufficient accuracy is of great difficulty due to high-dimensional space and various perturbations.With the rapid development of data science and scientific tools of m...The dynamical modeling of projectile systems with sufficient accuracy is of great difficulty due to high-dimensional space and various perturbations.With the rapid development of data science and scientific tools of measurement recently,there are numerous data-driven methods devoted to discovering governing laws from data.In this work,a data-driven method is employed to perform the modeling of the projectile based on the Kramers–Moyal formulas.More specifically,the four-dimensional projectile system is assumed as an It?stochastic differential equation.Then the least square method and sparse learning are applied to identify the drift coefficient and diffusion matrix from sample path data,which agree well with the real system.The effectiveness of the data-driven method demonstrates that it will become a powerful tool in extracting governing equations and predicting complex dynamical behaviors of the projectile.展开更多
Using stochastic dynamic simulation for railway vehicle collision still faces many challenges,such as high modelling complexity and time-consuming.To address the challenges,we introduce a novel data-driven stochastic ...Using stochastic dynamic simulation for railway vehicle collision still faces many challenges,such as high modelling complexity and time-consuming.To address the challenges,we introduce a novel data-driven stochastic process modelling(DSPM)approach into dynamic simulation of the railway vehicle collision.This DSPM approach consists of two steps:(i)process description,four kinds of kernels are used to describe the uncertainty inherent in collision processes;(ii)solving,stochastic variational inferences and mini-batch algorithms can then be used to accelerate computations of stochastic processes.By applying DSPM,Gaussian process regression(GPR)and finite element(FE)methods to two collision scenarios(i.e.lead car colliding with a rigid wall,and the lead car colliding with another lead car),we are able to achieve a comprehensive analysis.The comparison between the DSPM approach and the FE method revealed that the DSPM approach is capable of calculating the corresponding confidence interval,simultaneously improving the overall computational efficiency.Comparing the DSPM approach with the GPR method indicates that the DSPM approach has the ability to accurately describe the dynamic response under unknown conditions.Overall,this research demonstrates the feasibility and usability of the proposed DSPM approach for stochastic dynamics simulation of the railway vehicle collision.展开更多
With the rapid advancement of machine learning technology and its growing adoption in research and engineering applications,an increasing number of studies have embraced data-driven approaches for modeling wind turbin...With the rapid advancement of machine learning technology and its growing adoption in research and engineering applications,an increasing number of studies have embraced data-driven approaches for modeling wind turbine wakes.These models leverage the ability to capture complex,high-dimensional characteristics of wind turbine wakes while offering significantly greater efficiency in the prediction process than physics-driven models.As a result,data-driven wind turbine wake models are regarded as powerful and effective tools for predicting wake behavior and turbine power output.This paper aims to provide a concise yet comprehensive review of existing studies on wind turbine wake modeling that employ data-driven approaches.It begins by defining and classifying machine learning methods to facilitate a clearer understanding of the reviewed literature.Subsequently,the related studies are categorized into four key areas:wind turbine power prediction,data-driven analytic wake models,wake field reconstruction,and the incorporation of explicit physical constraints.The accuracy of data-driven models is influenced by two primary factors:the quality of the training data and the performance of the model itself.Accordingly,both data accuracy and model structure are discussed in detail within the review.展开更多
基金supported by the National Natural Science Foundation of China (52075420)the National Key Research and Development Program of China (2020YFB1708400)。
文摘With its generality and practicality, the combination of partial charging curves and machine learning(ML) for battery capacity estimation has attracted widespread attention. However, a clear classification,fair comparison, and performance rationalization of these methods are lacking, due to the scattered existing studies. To address these issues, we develop 20 capacity estimation methods from three perspectives:charging sequence construction, input forms, and ML models. 22,582 charging curves are generated from 44 cells with different battery chemistry and operating conditions to validate the performance. Through comprehensive and unbiased comparison, the long short-term memory(LSTM) based neural network exhibits the best accuracy and robustness. Across all 6503 tested samples, the mean absolute percentage error(MAPE) for capacity estimation using LSTM is 0.61%, with a maximum error of only 3.94%. Even with the addition of 3 m V voltage noise or the extension of sampling intervals to 60 s, the average MAPE remains below 2%. Furthermore, the charging sequences are provided with physical explanations related to battery degradation to enhance confidence in their application. Recommendations for using other competitive methods are also presented. This work provides valuable insights and guidance for estimating battery capacity based on partial charging curves.
基金This work is funded by National Natural Science Foundation of China(Nos.42202292,42141011)the Program for Jilin University(JLU)Science and Technology Innovative Research Team(No.2019TD-35).The authors would also like to thank the reviewers and editors whose critical comments are very helpful in preparing this article.
文摘To reduce CO_(2) emissions in response to global climate change,shale reservoirs could be ideal candidates for long-term carbon geo-sequestration involving multi-scale transport processes.However,most current CO_(2) sequestration models do not adequately consider multiple transport mechanisms.Moreover,the evaluation of CO_(2) storage processes usually involves laborious and time-consuming numerical simulations unsuitable for practical prediction and decision-making.In this paper,an integrated model involving gas diffusion,adsorption,dissolution,slip flow,and Darcy flow is proposed to accurately characterize CO_(2) storage in depleted shale reservoirs,supporting the establishment of a training database.On this basis,a hybrid physics-informed data-driven neural network(HPDNN)is developed as a deep learning surrogate for prediction and inversion.By incorporating multiple sources of scientific knowledge,the HPDNN can be configured with limited simulation resources,significantly accelerating the forward and inversion processes.Furthermore,the HPDNN can more intelligently predict injection performance,precisely perform reservoir parameter inversion,and reasonably evaluate the CO_(2) storage capacity under complicated scenarios.The validation and test results demonstrate that the HPDNN can ensure high accuracy and strong robustness across an extensive applicability range when dealing with field data with multiple noise sources.This study has tremendous potential to replace traditional modeling tools for predicting and making decisions about CO_(2) storage projects in depleted shale reservoirs.
基金supported by the National Natural Science Foundation of China (Nos.52274048 and 52374017)Beijing Natural Science Foundation (No.3222037)the CNPC 14th five-year perspective fundamental research project (No.2021DJ2104)。
文摘The shale gas development process is complex in terms of its flow mechanisms and the accuracy of the production forecasting is influenced by geological parameters and engineering parameters.Therefore,to quantitatively evaluate the relative importance of model parameters on the production forecasting performance,sensitivity analysis of parameters is required.The parameters are ranked according to the sensitivity coefficients for the subsequent optimization scheme design.A data-driven global sensitivity analysis(GSA)method using convolutional neural networks(CNN)is proposed to identify the influencing parameters in shale gas production.The CNN is trained on a large dataset,validated against numerical simulations,and utilized as a surrogate model for efficient sensitivity analysis.Our approach integrates CNN with the Sobol'global sensitivity analysis method,presenting three key scenarios for sensitivity analysis:analysis of the production stage as a whole,analysis by fixed time intervals,and analysis by declining rate.The findings underscore the predominant influence of reservoir thickness and well length on shale gas production.Furthermore,the temporal sensitivity analysis reveals the dynamic shifts in parameter importance across the distinct production stages.
基金supported by the National Natural Science Foundation of China(Grant Nos.12272257,12102292,12032006)the special fund for Science and Technology Innovation Teams of Shanxi Province(Nos.202204051002006).
文摘This study employs a data-driven methodology that embeds the principle of dimensional invariance into an artificial neural network to automatically identify dominant dimensionless quantities in the penetration of rod projectiles into semi-infinite metal targets from experimental measurements.The derived mathematical expressions of dimensionless quantities are simplified by the examination of the exponent matrix and coupling relationships between feature variables.As a physics-based dimension reduction methodology,this way reduces high-dimensional parameter spaces to descriptions involving only a few physically interpretable dimensionless quantities in penetrating cases.Then the relative importance of various dimensionless feature variables on the penetration efficiencies for four impacting conditions is evaluated through feature selection engineering.The results indicate that the selected critical dimensionless feature variables by this synergistic method,without referring to the complex theoretical equations and aiding in the detailed knowledge of penetration mechanics,are in accordance with those reported in the reference.Lastly,the determined dimensionless quantities can be efficiently applied to conduct semi-empirical analysis for the specific penetrating case,and the reliability of regression functions is validated.
基金supported by the National Natural Science Foundation of China (No.62173281,52377217,U23A20651)Sichuan Science and Technology Program (No.24NSFSC0024,23ZDYF0734,23NSFSC1436)+2 种基金Dazhou City School Cooperation Project (No.DZXQHZ006)Technopole Talent Summit Project (No.KJCRCFH08)Robert Gordon University。
文摘Lithium-ion batteries are the preferred green energy storage method and are equipped with intelligent battery management systems(BMSs)that efficiently manage the batteries.This not only ensures the safety performance of the batteries but also significantly improves their efficiency and reduces their damage rate.Throughout their whole life cycle,lithium-ion batteries undergo aging and performance degradation due to diverse external environments and irregular degradation of internal materials.This degradation is reflected in the state of health(SOH)assessment.Therefore,this review offers the first comprehensive analysis of battery SOH estimation strategies across the entire lifecycle over the past five years,highlighting common research focuses rooted in data-driven methods.It delves into various dimensions such as dataset integration and preprocessing,health feature parameter extraction,and the construction of SOH estimation models.These approaches unearth hidden insights within data,addressing the inherent tension between computational complexity and estimation accuracy.To enha nce support for in-vehicle implementation,cloud computing,and the echelon technologies of battery recycling,remanufacturing,and reuse,as well as to offer insights into these technologies,a segmented management approach will be introduced in the future.This will encompass source domain data processing,multi-feature factor reconfiguration,hybrid drive modeling,parameter correction mechanisms,and fulltime health management.Based on the best SOH estimation outcomes,health strategies tailored to different stages can be devised in the future,leading to the establishment of a comprehensive SOH assessment framework.This will mitigate cross-domain distribution disparities and facilitate adaptation to a broader array of dynamic operation protocols.This article reviews the current research landscape from four perspectives and discusses the challenges that lie ahead.Researchers and practitioners can gain a comprehensive understanding of battery SOH estimation methods,offering valuable insights for the development of advanced battery management systems and embedded application research.
基金supported by the Chinese Scholarship Council(Nos.202208320055 and 202108320111)the support from the energy department of Aalborg University was acknowledged.
文摘Utilizing machine learning techniques for data-driven diagnosis of high temperature PEM fuel cells is beneficial and meaningful to the system durability. Nevertheless, ensuring the robustness of diagnosis remains a critical and challenging task in real application. To enhance the robustness of diagnosis and achieve a more thorough evaluation of diagnostic performance, a robust diagnostic procedure based on electrochemical impedance spectroscopy (EIS) and a new method for evaluation of the diagnosis robustness was proposed and investigated in this work. To improve the diagnosis robustness: (1) the degradation mechanism of different faults in the high temperature PEM fuel cell was first analyzed via the distribution of relaxation time of EIS to determine the equivalent circuit model (ECM) with better interpretability, simplicity and accuracy;(2) the feature extraction was implemented on the identified parameters of the ECM and extra attention was paid to distinguishing between the long-term normal degradation and other faults;(3) a Siamese Network was adopted to get features with higher robustness in a new embedding. The diagnosis was conducted using 6 classic classification algorithms—support vector machine (SVM), K-nearest neighbor (KNN), logistic regression (LR), decision tree (DT), random forest (RF), and Naive Bayes employing a dataset comprising a total of 1935 collected EIS. To evaluate the robustness of trained models: (1) different levels of errors were added to the features for performance evaluation;(2) a robustness coefficient (Roubust_C) was defined for a quantified and explicit evaluation of the diagnosis robustness. The diagnostic models employing the proposed feature extraction method can not only achieve the higher performance of around 100% but also higher robustness for diagnosis models. Despite the initial performance being similar, the KNN demonstrated a superior robustness after feature selection and re-embedding by triplet-loss method, which suggests the necessity of robustness evaluation for the machine learning models and the effectiveness of the defined robustness coefficient. This work hopes to give new insights to the robust diagnosis of high temperature PEM fuel cells and more comprehensive performance evaluation of the data-driven method for diagnostic application.
基金partially supported by the National Natural Science Foundation of China(61751306,61801208,61671233)the Jiangsu Science Foundation(BK20170650)+2 种基金the Postdoctoral Science Foundation of China(BX201700118,2017M621712)the Jiangsu Postdoctoral Science Foundation(1701118B)the Fundamental Research Funds for the Central Universities(021014380094)
文摘During the past few decades,mobile wireless communications have experienced four generations of technological revolution,namely from 1 G to 4 G,and the deployment of the latest 5 G networks is expected to take place in 2019.One fundamental question is how we can push forward the development of mobile wireless communications while it has become an extremely complex and sophisticated system.We believe that the answer lies in the huge volumes of data produced by the network itself,and machine learning may become a key to exploit such information.In this paper,we elaborate why the conventional model-based paradigm,which has been widely proved useful in pre-5 G networks,can be less efficient or even less practical in the future 5 G and beyond mobile networks.Then,we explain how the data-driven paradigm,using state-of-the-art machine learning techniques,can become a promising solution.At last,we provide a typical use case of the data-driven paradigm,i.e.,proactive load balancing,in which online learning is utilized to adjust cell configurations in advance to avoid burst congestion caused by rapid traffic changes.
基金funded by National Natural Science Foundation of China(52004238)China Postdoctoral Science Foundation(2019M663561).
文摘Increasing the production and utilization of shale gas is of great significance for building a clean and low-carbon energy system.Sharp decline of gas production has been widely observed in shale gas reservoirs.How to forecast shale gas production is still challenging due to complex fracture networks,dynamic fracture properties,frac hits,complicated multiphase flow,and multi-scale flow as well as data quality and uncertainty.This work develops an integrated framework for evaluating shale gas well production based on data-driven models.Firstly,a comprehensive dominated-factor system has been established,including geological,drilling,fracturing,and production factors.Data processing and visualization are required to ensure data quality and determine final data set.A shale gas production evaluation model is developed to evaluate shale gas production levels.Finally,the random forest algorithm is used to forecast shale gas production.The prediction accuracy of shale gas production level is higher than 95%based on the shale gas reservoirs in China.Forty-one wells are randomly selected to predict cumulative gas production using the optimal regression model.The proposed shale gas production evaluation frame-work overcomes too many assumptions of analytical or semi-analytical models and avoids huge computation cost and poor generalization for numerical modelling.
基金supported by Beijing Natural Science Foundation(L223025)National Natural Science Foundation of China(62201067)R and D Program of Beijing Municipal Education Commission(KM202211232008)。
文摘Recently,orthogonal time frequency space(OTFS)was presented to alleviate severe Doppler effects in high mobility scenarios.Most of the current OTFS detection schemes rely on perfect channel state information(CSI).However,in real-life systems,the parameters of channels will constantly change,which are often difficult to capture and describe.In this paper,we summarize the existing research on OTFS detection based on data-driven deep learning(DL)and propose three new network structures.The presented three networks include a residual network(ResNet),a dense network(DenseNet),and a residual dense network(RDN)for OTFS detection.The detection schemes based on data-driven paradigms do not require a model that is easy to handle mathematically.Meanwhile,compared with the existing fully connected-deep neural network(FC-DNN)and standard convolutional neural network(CNN),these three new networks can alleviate the problems of gradient explosion and gradient disappearance.Through simulation,it is proved that RDN has the best performance among the three proposed schemes due to the combination of shallow and deep features.RDN can solve the issue of performance loss caused by the traditional network not fully utilizing all the hierarchical information.
文摘A corrosion defect is recognized as one of the most severe phenomena for high-pressure pipelines,especially those served for a long time.Finite-element method and empirical formulas are thereby used for the strength prediction of such pipes with corrosion.However,it is time-consuming for finite-element method and there is a limited application range by using empirical formulas.In order to improve the prediction of strength,this paper investigates the burst pressure of line pipelines with a single corrosion defect subjected to internal pressure based on data-driven methods.Three supervised ML(machine learning)algorithms,including the ANN(artificial neural network),the SVM(support vector machine)and the LR(linear regression),are deployed to train models based on experimental data.Data analysis is first conducted to determine proper pipe features for training.Hyperparameter tuning to control the learning process is then performed to fit the best strength models for corroded pipelines.Among all the proposed data-driven models,the ANN model with three neural layers has the highest training accuracy,but also presents the largest variance.The SVM model provides both high training accuracy and high validation accuracy.The LR model has the best performance in terms of generalization ability.These models can be served as surrogate models by transfer learning with new coming data in future research,facilitating a sustainable and intelligent decision-making of corroded pipelines.
基金This work was supported by the National Natural Science Foundation of China(Grant No.42050104)the Science Foundation of SINOPEC Group(Grant No.P20030).
文摘A comprehensive and precise analysis of shale gas production performance is crucial for evaluating resource potential,designing a field development plan,and making investment decisions.However,quantitative analysis can be challenging because production performance is dominated by the complex interaction among a series of geological and engineering factors.In fact,each factor can be viewed as a player who makes cooperative contributions to the production payoff within the constraints of physical laws and models.Inspired by the idea,we propose a hybrid data-driven analysis framework in this study,where the contributions of dominant factors are quantitatively evaluated,the productions are precisely forecasted,and the development optimization suggestions are comprehensively generated.More specifically,game theory and machine learning models are coupled to determine the dominating geological and engineering factors.The Shapley value with definite physical meaning is employed to quantitatively measure the effects of individual factors.A multi-model-fused stacked model is trained for production forecast,which provides the basis for derivative-free optimization algorithms to optimize the development plan.The complete workflow is validated with actual production data collected from the Fuling shale gas field,Sichuan Basin,China.The validation results show that the proposed procedure can draw rigorous conclusions with quantified evidence and thereby provide specific and reliable suggestions for development plan optimization.Comparing with traditional and experience-based approaches,the hybrid data-driven procedure is advanced in terms of both efficiency and accuracy.
文摘In the current data-intensive era, the traditional hands-on method of conducting scientific research by exploring related publications to generate a testable hypothesis is well on its way of becoming obsolete within just a year or two. Analyzing the literature and data to automatically generate a hypothesis might become the de facto approach to inform the core research efforts of those trying to master the exponentially rapid expansion of publications and datasets. Here, viewpoints are provided and discussed to help the understanding of challenges of data-driven discovery.
基金supported by the National Key Research and Development Project of China(2018YFE0122200).
文摘A robust low-carbon economic optimal scheduling method that considers source-load uncertainty and hydrogen energy utilization is developed.The proposed method overcomes the challenge of source-load random fluctuations in integrated energy systems(IESs)in the operation scheduling problem of integrated energy production units(IEPUs).First,to solve the problem of inaccurate prediction of renewable energy output,an improved robust kernel density estimation method is proposed to construct a data-driven uncertainty output set of renewable energy sources statistically and build a typical scenario of load uncertainty using stochastic scenario reduction.Subsequently,to resolve the problem of insufficient utilization of hydrogen energy in existing IEPUs,a robust low-carbon economic optimal scheduling model of the source-load interaction of an IES with a hydrogen energy system is established.The system considers the further utilization of energy using hydrogen energy coupling equipment(such as hydrogen storage devices and fuel cells)and the comprehensive demand response of load-side schedulable resources.The simulation results show that the proposed robust stochastic optimization model driven by data can effectively reduce carbon dioxide emissions,improve the source-load interaction of the IES,realize the efficient use of hydrogen energy,and improve system robustness.
文摘With the integration of renewable energy resources,the inertia of power systems significantly reduces,thereby making the system sensitive to operational disturbances.A disturbance-based method is presented herein to estimate inertia,uncovering the influence of renewables on system-resilient operations.The Gaussian process regression method is then used to predict the power system trajectory after disturbance.Extensive tests demonstrate the data-driven method mathematically estimates the inertia of the system as well as predicts the dynamics operations of power grids subject to disturbances.Numerical results also offer insights into the enhancement of system resilience by strategically designing the inertia of power systems.
基金Supported by the Key Program of Petro China Exploration&Production Company(Grant No.kt2017-17-01-1 and kt2017-17-06-1)Consulting Project of Chinese Academy of Engineering(Grant No.2019-XZ-17)
文摘Based on the traditional numerical simulation and optimization algorithms,in combination with the layered injection and production"hard data"monitored at real time by automatic control technology,a systematic approach for detailed water injection design using data-driven algorithms is proposed.First the data assimilation technology is used to match geological model parameters under the constraint of observed well dynamics;the flow relationships between injectors and producers in the block are calculated based on automatic identification method for layered injection-production flow relationship;multi-layer and multi-direction production splitting technique is used to calculate the liquid and oil production of producers in different layers and directions and obtain quantified indexes of water injection effect.Then,machine learning algorithms are applied to evaluate the effectiveness of water injection in different layers of wells and to perform the water injection direction adjustment.Finally,the particle swarm algorithm is used to optimize the detailed water injection plan and to make production predictions.This method and procedure make full use of the automation and intelligence of data-driven and machine learning algorithms.This method was used to match the data of a complex faulted reservoir in eastern China,achieving a fitting level of 85%.The cumulative oil production in the example block for 12 months after optimization is 8.2%higher than before.This method can help design detailed water injection program for mature oilfields.
文摘Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data mining are to discover knowledge of interest to user needs.Data mining is really a useful tool in many domains such as marketing, decision making, etc. However, some basic issues of data mining are ignored. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? Is there any rule we should obey in a data mining process? In order to discover patterns and knowledge really interesting and actionable to the real world Zhang et al proposed a domain-driven human-machine-cooperated data mining process.Zhao and Yao proposed an interactive user-driven classification method using the granule network. In our work, we find that data mining is a kind of knowledge transforming process to transform knowledge from data format into symbol format. Thus, no new knowledge could be generated (born) in a data mining process. In a data mining process, knowledge is just transformed from data format, which is not understandable for human, into symbol format,which is understandable for human and easy to be used.It is similar to the process of translating a book from Chinese into English.In this translating process,the knowledge itself in the book should remain unchanged. What will be changed is the format of the knowledge only. That is, the knowledge in the English book should be kept the same as the knowledge in the Chinese one.Otherwise, there must be some mistakes in the translating proces, that is, we are transforming knowledge from one format into another format while not producing new knowledge in a data mining process. The knowledge is originally stored in data (data is a representation format of knowledge). Unfortunately, we can not read, understand, or use it, since we can not understand data. With this understanding of data mining, we proposed a data-driven knowledge acquisition method based on rough sets. It also improved the performance of classical knowledge acquisition methods. In fact, we also find that the domain-driven data mining and user-driven data mining do not conflict with our data-driven data mining. They could be integrated into domain-oriented data-driven data mining. It is just like the views of data base. Users with different views could look at different partial data of a data base. Thus, users with different tasks or objectives wish, or could discover different knowledge (partial knowledge) from the same data base. However, all these partial knowledge should be originally existed in the data base. So, a domain-oriented data-driven data mining method would help us to extract the knowledge which is really existed in a data base, and really interesting and actionable to the real world.
基金the Six Talent Peaks Project in Jiangsu Province,China(Grant No.JXQC-002)。
文摘The dynamical modeling of projectile systems with sufficient accuracy is of great difficulty due to high-dimensional space and various perturbations.With the rapid development of data science and scientific tools of measurement recently,there are numerous data-driven methods devoted to discovering governing laws from data.In this work,a data-driven method is employed to perform the modeling of the projectile based on the Kramers–Moyal formulas.More specifically,the four-dimensional projectile system is assumed as an It?stochastic differential equation.Then the least square method and sparse learning are applied to identify the drift coefficient and diffusion matrix from sample path data,which agree well with the real system.The effectiveness of the data-driven method demonstrates that it will become a powerful tool in extracting governing equations and predicting complex dynamical behaviors of the projectile.
基金supported by the National Key Research and Development Project(No.2019YFB1405401)the National Natural Science Foundation of China(No.5217120056)。
文摘Using stochastic dynamic simulation for railway vehicle collision still faces many challenges,such as high modelling complexity and time-consuming.To address the challenges,we introduce a novel data-driven stochastic process modelling(DSPM)approach into dynamic simulation of the railway vehicle collision.This DSPM approach consists of two steps:(i)process description,four kinds of kernels are used to describe the uncertainty inherent in collision processes;(ii)solving,stochastic variational inferences and mini-batch algorithms can then be used to accelerate computations of stochastic processes.By applying DSPM,Gaussian process regression(GPR)and finite element(FE)methods to two collision scenarios(i.e.lead car colliding with a rigid wall,and the lead car colliding with another lead car),we are able to achieve a comprehensive analysis.The comparison between the DSPM approach and the FE method revealed that the DSPM approach is capable of calculating the corresponding confidence interval,simultaneously improving the overall computational efficiency.Comparing the DSPM approach with the GPR method indicates that the DSPM approach has the ability to accurately describe the dynamic response under unknown conditions.Overall,this research demonstrates the feasibility and usability of the proposed DSPM approach for stochastic dynamics simulation of the railway vehicle collision.
基金Supported by the National Natural Science Foundation of China under Grant No.52131102.
文摘With the rapid advancement of machine learning technology and its growing adoption in research and engineering applications,an increasing number of studies have embraced data-driven approaches for modeling wind turbine wakes.These models leverage the ability to capture complex,high-dimensional characteristics of wind turbine wakes while offering significantly greater efficiency in the prediction process than physics-driven models.As a result,data-driven wind turbine wake models are regarded as powerful and effective tools for predicting wake behavior and turbine power output.This paper aims to provide a concise yet comprehensive review of existing studies on wind turbine wake modeling that employ data-driven approaches.It begins by defining and classifying machine learning methods to facilitate a clearer understanding of the reviewed literature.Subsequently,the related studies are categorized into four key areas:wind turbine power prediction,data-driven analytic wake models,wake field reconstruction,and the incorporation of explicit physical constraints.The accuracy of data-driven models is influenced by two primary factors:the quality of the training data and the performance of the model itself.Accordingly,both data accuracy and model structure are discussed in detail within the review.