Using the advantages of web crawlers in data collection and distributed storage technologies,we accessed to a wealth of forestry-related data.Combined with the mature big data technology at its present stage,Hadoop...Using the advantages of web crawlers in data collection and distributed storage technologies,we accessed to a wealth of forestry-related data.Combined with the mature big data technology at its present stage,Hadoop's distributed system was selected to solve the storage problem of massive forestry big data and the memory-based Spark computing framework to realize real-time and fast processing of data.The forestry data contains a wealth of information,and mining this information is of great significance for guiding the development of forestry.We conducts co-word and cluster analyses on the keywords of forestry data,extracts the rules hidden in the data,analyzes the research hotspots more accurately,grasps the evolution trend of subject topics,and plays an important role in promoting the research and development of subject areas.The co-word analysis and clustering algorithm have important practical significance for the topic structure,research hotspot or development trend in the field of forestry research.Distributed storage framework and parallel computing have greatly improved the performance of data mining algorithms.Therefore,the forestry big data mining system by big data technology has important practical significance for promoting the development of intelligent forestry.展开更多
Purpose:Due to the incompleteness nature of knowledge graphs(KGs),the task of predicting missing links between entities becomes important.Many previous approaches are static,this posed a notable problem that all meani...Purpose:Due to the incompleteness nature of knowledge graphs(KGs),the task of predicting missing links between entities becomes important.Many previous approaches are static,this posed a notable problem that all meanings of a polysemous entity share one embedding vector.This study aims to propose a polysemous embedding approach,named KG embedding under relational contexts(ContE for short),for missing link prediction.Design/methodology/approach:ContE models and infers different relationship patterns by considering the context of the relationship,which is implicit in the local neighborhood of the relationship.The forward and backward impacts of the relationship in ContE are mapped to two different embedding vectors,which represent the contextual information of the relationship.Then,according to the position of the entity,the entity’s polysemous representation is obtained by adding its static embedding vector to the corresponding context vector of the relationship.Findings:ContE is a fully expressive,that is,given any ground truth over the triples,there are embedding assignments to entities and relations that can precisely separate the true triples from false ones.ContE is capable of modeling four connectivity patterns such as symmetry,antisymmetry,inversion and composition.Research limitations:ContE needs to do a grid search to find best parameters to get best performance in practice,which is a time-consuming task.Sometimes,it requires longer entity vectors to get better performance than some other models.Practical implications:ContE is a bilinear model,which is a quite simple model that could be applied to large-scale KGs.By considering contexts of relations,ContE can distinguish the exact meaning of an entity in different triples so that when performing compositional reasoning,it is capable to infer the connectivity patterns of relations and achieves good performance on link prediction tasks.Originality/value:ContE considers the contexts of entities in terms of their positions in triples and the relationships they link to.It decomposes a relation vector into two vectors,namely,forward impact vector and backward impact vector in order to capture the relational contexts.ContE has the same low computational complexity as TransE.Therefore,it provides a new approach for contextualized knowledge graph embedding.展开更多
Knowledge graph technology has distinct advantages in terms of fault diagnosis.In this study,the control rod drive mechanism(CRDM)of the liquid fuel thorium molten salt reactor(TMSR-LF1)was taken as the research objec...Knowledge graph technology has distinct advantages in terms of fault diagnosis.In this study,the control rod drive mechanism(CRDM)of the liquid fuel thorium molten salt reactor(TMSR-LF1)was taken as the research object,and a fault diagnosis system was proposed based on knowledge graph.The subject–relation–object triples are defined based on CRDM unstructured data,including design specification,operation and maintenance manual,alarm list,and other forms of expert experience.In this study,we constructed a fault event ontology model to label the entity and relationship involved in the corpus of CRDM fault events.A three-layer robustly optimized bidirectional encoder representation from transformers(RBT3)pre-training approach combined with a text convolutional neural network(TextCNN)was introduced to facilitate the application of the constructed CRDM fault diagnosis graph database for fault query.The RBT3-TextCNN model along with the Jieba tool is proposed for extracting entities and recognizing the fault query intent simultaneously.Experiments on the dataset collected from TMSR-LF1 CRDM fault diagnosis unstructured data demonstrate that this model has the potential to improve the effect of intent recognition and entity extraction.Additionally,a fault alarm monitoring module was developed based on WebSocket protocol to deliver detailed information about the appeared fault to the operator automatically.Furthermore,the Bayesian inference method combined with the variable elimination algorithm was proposed to enable the development of a relatively intelligent and reliable fault diagnosis system.Finally,a CRDM fault diagnosis Web interface integrated with graph data visualization was constructed,making the CRDM fault diagnosis process intuitive and effective.展开更多
Purpose:This work aims to normalize the NLPCONTRIBUTIONS scheme(henceforward,NLPCONTRIBUTIONGRAPH)to structure,directly from article sentences,the contributions information in Natural Language Processing(NLP)scholarly...Purpose:This work aims to normalize the NLPCONTRIBUTIONS scheme(henceforward,NLPCONTRIBUTIONGRAPH)to structure,directly from article sentences,the contributions information in Natural Language Processing(NLP)scholarly articles via a two-stage annotation methodology:1)pilot stage-to define the scheme(described in prior work);and 2)adjudication stage-to normalize the graphing model(the focus of this paper).Design/methodology/approach:We re-annotate,a second time,the contributions-pertinent information across 50 prior-annotated NLP scholarly articles in terms of a data pipeline comprising:contribution-centered sentences,phrases,and triple statements.To this end,specifically,care was taken in the adjudication annotation stage to reduce annotation noise while formulating the guidelines for our proposed novel NLP contributions structuring and graphing scheme.Findings:The application of NLPCONTRIBUTIONGRAPH on the 50 articles resulted finally in a dataset of 900 contribution-focused sentences,4,702 contribution-information-centered phrases,and 2,980 surface-structured triples.The intra-annotation agreement between the first and second stages,in terms of F1-score,was 67.92%for sentences,41.82%for phrases,and 22.31%for triple statements indicating that with increased granularity of the information,the annotation decision variance is greater.Research limitations:NLPCONTRIBUTIONGRAPH has limited scope for structuring scholarly contributions compared with STEM(Science,Technology,Engineering,and Medicine)scholarly knowledge at large.Further,the annotation scheme in this work is designed by only an intra-annotator consensus-a single annotator first annotated the data to propose the initial scheme,following which,the same annotator reannotated the data to normalize the annotations in an adjudication stage.However,the expected goal of this work is to achieve a standardized retrospective model of capturing NLP contributions from scholarly articles.This would entail a larger initiative of enlisting multiple annotators to accommodate different worldviews into a“single”set of structures and relationships as the final scheme.Given that the initial scheme is first proposed and the complexity of the annotation task in the realistic timeframe,our intraannotation procedure is well-suited.Nevertheless,the model proposed in this work is presently limited since it does not incorporate multiple annotator worldviews.This is planned as future work to produce a robust model.Practical implications:We demonstrate NLPCONTRIBUTIONGRAPH data integrated into the Open Research Knowledge Graph(ORKG),a next-generation KG-based digital library with intelligent computations enabled over structured scholarly knowledge,as a viable aid to assist researchers in their day-to-day tasks.Originality/value:NLPCONTRIBUTIONGRAPH is a novel scheme to annotate research contributions from NLP articles and integrate them in a knowledge graph,which to the best of our knowledge does not exist in the community.Furthermore,our quantitative evaluations over the two-stage annotation tasks offer insights into task difficulty.展开更多
Based on the well logging knowledge graph of hydrocarbon-bearing formation(HBF),a Knowledge-Powered Neural Network Formation Evaluation model(KPNFE)has been proposed.It has the following functions:(1)extracting charac...Based on the well logging knowledge graph of hydrocarbon-bearing formation(HBF),a Knowledge-Powered Neural Network Formation Evaluation model(KPNFE)has been proposed.It has the following functions:(1)extracting characteristic parameters describing HBF in multiple dimensions and multiple scales;(2)showing the characteristic parameter-related entities,relationships,and attributes as vectors via graph embedding technique;(3)intelligently identifying HBF;(4)seamlessly integrating expertise into the intelligent computing to establish the assessment system and ranking algorithm for potential pay recommendation.Taking 547 wells encountered the low porosity and low permeability Chang 6 Member of Triassic in the Jiyuan Block of Ordos Basin,NW China as objects,80%of the wells were randomly selected as the training dataset and the remainder as the validation dataset.The KPNFE prediction results on the validation dataset had a coincidence rate of 94.43%with the expert interpretation results and a coincidence rate of 84.38%for all the oil testing layers,which is 13 percentage points higher in accuracy and over 100 times faster than the primary conventional interpretation.In addition,a number of potential pays likely to produce industrial oil were recommended.The KPNFE model effectively inherits,carries forward and improves the expert knowledge,nicely solving the robustness problem in HBF identification.The KPNFE,with good interpretability and high accuracy of computation results,is a powerful technical means for efficient and high-quality well logging re-evaluation of old wells in mature oilfields.展开更多
With the construction of new power systems,the power grid has become extremely large,with an increasing proportion of new energy and AC/DC hybrid connections.The dynamic characteristics and fault patterns of the power...With the construction of new power systems,the power grid has become extremely large,with an increasing proportion of new energy and AC/DC hybrid connections.The dynamic characteristics and fault patterns of the power grid are complex;additionally,power grid control is difficult,operation risks are high,and the task of fault handling is arduous.Traditional power-grid fault handling relies primarily on human experience.The difference in and lack of knowledge reserve of control personnel restrict the accuracy and timeliness of fault handling.Therefore,this mode of operation is no longer suitable for the requirements of new systems.Based on the multi-source heterogeneous data of power grid dispatch,this paper proposes a joint entity–relationship extraction method for power-grid dispatch fault processing based on a pre-trained model,constructs a knowledge graph of power-grid dispatch fault processing and designs,and develops a fault-processing auxiliary decision-making system based on the knowledge graph.It was applied to study a provincial dispatch control center,and it effectively improved the accident processing ability and intelligent level of accident management and control of the power grid.展开更多
At present,knowledge embedding methods are widely used in the field of knowledge graph(KG)reasoning,and have been successfully applied to those with large entities and relationships.However,in research and production ...At present,knowledge embedding methods are widely used in the field of knowledge graph(KG)reasoning,and have been successfully applied to those with large entities and relationships.However,in research and production environments,there are a large number of KGs with a small number of entities and relations,which are called sparse KGs.Limited by the performance of knowledge extraction methods or some other reasons(some common-sense information does not appear in the natural corpus),the relation between entities is often incomplete.To solve this problem,a method of the graph neural network and information enhancement is proposed.The improved method increases the mean reciprocal rank(MRR)and Hit@3 by 1.6%and 1.7%,respectively,when the sparsity of the FB15K-237 dataset is 10%.When the sparsity is 50%,the evaluation indexes MRR and Hit@10 are increased by 0.8%and 1.8%,respectively.展开更多
Aquatic medicine knowledge graph is an effective means to realize intelligent aquaculture.Graph completion technology is key to improving the quality of knowledge graph construction.However,the difficulty of semantic ...Aquatic medicine knowledge graph is an effective means to realize intelligent aquaculture.Graph completion technology is key to improving the quality of knowledge graph construction.However,the difficulty of semantic discrimination among similar entities and inconspicuous semantic features result in low accuracy when completing aquatic medicine knowledge graph with complex relationships.In this study,an aquatic medicine knowledge graph completion method(TransH+HConvAM)is proposed.Firstly,TransH is applied to split the vector plane between entities and relations,ameliorating the poor completion effect caused by low semantic resolution of entities.Then,hybrid convolution is introduced to obtain the global interaction of triples based on the complete interaction between head/tail entities and relations,which improves the semantic features of triples and enhances the completion effect of complex relationships in the graph.Experiments are conducted to verify the performance of the proposed method.The MR,MRR and Hit@10 of the TransH+HConvAM are found to be 674,0.339,and 0.361,respectively.This study shows that the model effectively overcomes the poor completion effect of complex relationships and improves the construction quality of the aquatic medicine knowledge graph,providing technical support for intelligent aquaculture.展开更多
Text event mining,as an indispensable method of text mining processing,has attracted the extensive attention of researchers.A modeling method for knowledge graph of events based on mutual information among neighbor do...Text event mining,as an indispensable method of text mining processing,has attracted the extensive attention of researchers.A modeling method for knowledge graph of events based on mutual information among neighbor domains and sparse representation is proposed in this paper,i.e.UKGE-MS.Specifically,UKGE-MS can improve the existing text mining technology's ability of understanding and discovering high-dimensional unmarked information,and solves the problems of traditional unsupervised feature selection methods,which only focus on selecting features from a global perspective and ignoring the impact of local connection of samples.Firstly,considering the influence of local information of samples in feature correlation evaluation,a feature clustering algorithm based on average neighborhood mutual information is proposed,and the feature clusters with certain event correlation are obtained;Secondly,an unsupervised feature selection method based on the high-order correlation of multi-dimensional statistical data is designed by combining the dimension reduction advantage of local linear embedding algorithm and the feature selection ability of sparse representation,so as to enhance the generalization ability of the selected feature items.Finally,the events knowledge graph is constructed by means of sparse representation and l1 norm.Extensive experiments are carried out on five real datasets and synthetic datasets,and the UKGE-MS are compared with five corresponding algorithms.The experimental results show that UKGE-MS is better than the traditional method in event clustering and feature selection,and has some advantages over other methods in text event recognition and discovery.展开更多
钻井顶部驱动装置结构复杂、故障类型多样,现有的故障树分析法和专家系统难以有效应对复杂多变的现场情况。为此,利用知识图谱在结构化与非结构化信息融合、故障模式关联分析以及先验知识传递方面的优势,提出了一种基于知识图谱的钻井...钻井顶部驱动装置结构复杂、故障类型多样,现有的故障树分析法和专家系统难以有效应对复杂多变的现场情况。为此,利用知识图谱在结构化与非结构化信息融合、故障模式关联分析以及先验知识传递方面的优势,提出了一种基于知识图谱的钻井顶部驱动装置故障诊断方法,利用以Transformer为基础的双向编码器模型(Bidirectional Encoder Representations from Transformers,BERT)构建了混合神经网络模型BERT-BiLSTM-CRF与BERT-BiLSTM-Attention,分别实现了顶驱故障文本数据的命名实体识别和关系抽取,并通过相似度计算,实现了故障知识的有效融合和智能问答,最终构建了顶部驱动装置故障诊断方法。研究结果表明:①在故障实体识别任务上,BERT-BiLSTM-CRF模型的精确度达到95.49%,能够有效识别故障文本中的信息实体;②在故障关系抽取上,BERT-BiLSTM-Attention模型的精确度达到93.61%,实现了知识图谱关系边的正确建立;③开发的问答系统实现了知识图谱的智能应用,其在多个不同类型问题上的回答准确率超过了90%,能够满足现场使用需求。结论认为,基于知识图谱的故障诊断方法能够有效利用顶部驱动装置的先验知识,实现故障的快速定位与智能诊断,具备良好的应用前景。展开更多
基金grants from the Fundamental Research Funds for the Central Universities(Grant No.2572018BH02)Special Funds for Scientific Research in the Forestry Public Welfare Industry(Grant Nos.201504307-03)。
文摘Using the advantages of web crawlers in data collection and distributed storage technologies,we accessed to a wealth of forestry-related data.Combined with the mature big data technology at its present stage,Hadoop's distributed system was selected to solve the storage problem of massive forestry big data and the memory-based Spark computing framework to realize real-time and fast processing of data.The forestry data contains a wealth of information,and mining this information is of great significance for guiding the development of forestry.We conducts co-word and cluster analyses on the keywords of forestry data,extracts the rules hidden in the data,analyzes the research hotspots more accurately,grasps the evolution trend of subject topics,and plays an important role in promoting the research and development of subject areas.The co-word analysis and clustering algorithm have important practical significance for the topic structure,research hotspot or development trend in the field of forestry research.Distributed storage framework and parallel computing have greatly improved the performance of data mining algorithms.Therefore,the forestry big data mining system by big data technology has important practical significance for promoting the development of intelligent forestry.
基金supported by the Key R&D Program Project of Zhejiang Province under Grant no.2019 C01004 and 2021C02004.
文摘Purpose:Due to the incompleteness nature of knowledge graphs(KGs),the task of predicting missing links between entities becomes important.Many previous approaches are static,this posed a notable problem that all meanings of a polysemous entity share one embedding vector.This study aims to propose a polysemous embedding approach,named KG embedding under relational contexts(ContE for short),for missing link prediction.Design/methodology/approach:ContE models and infers different relationship patterns by considering the context of the relationship,which is implicit in the local neighborhood of the relationship.The forward and backward impacts of the relationship in ContE are mapped to two different embedding vectors,which represent the contextual information of the relationship.Then,according to the position of the entity,the entity’s polysemous representation is obtained by adding its static embedding vector to the corresponding context vector of the relationship.Findings:ContE is a fully expressive,that is,given any ground truth over the triples,there are embedding assignments to entities and relations that can precisely separate the true triples from false ones.ContE is capable of modeling four connectivity patterns such as symmetry,antisymmetry,inversion and composition.Research limitations:ContE needs to do a grid search to find best parameters to get best performance in practice,which is a time-consuming task.Sometimes,it requires longer entity vectors to get better performance than some other models.Practical implications:ContE is a bilinear model,which is a quite simple model that could be applied to large-scale KGs.By considering contexts of relations,ContE can distinguish the exact meaning of an entity in different triples so that when performing compositional reasoning,it is capable to infer the connectivity patterns of relations and achieves good performance on link prediction tasks.Originality/value:ContE considers the contexts of entities in terms of their positions in triples and the relationships they link to.It decomposes a relation vector into two vectors,namely,forward impact vector and backward impact vector in order to capture the relational contexts.ContE has the same low computational complexity as TransE.Therefore,it provides a new approach for contextualized knowledge graph embedding.
基金the Young Potential Program of Shanghai Institute of Applied Physics,Chinese Academy of Sciences(No.E0553101).
文摘Knowledge graph technology has distinct advantages in terms of fault diagnosis.In this study,the control rod drive mechanism(CRDM)of the liquid fuel thorium molten salt reactor(TMSR-LF1)was taken as the research object,and a fault diagnosis system was proposed based on knowledge graph.The subject–relation–object triples are defined based on CRDM unstructured data,including design specification,operation and maintenance manual,alarm list,and other forms of expert experience.In this study,we constructed a fault event ontology model to label the entity and relationship involved in the corpus of CRDM fault events.A three-layer robustly optimized bidirectional encoder representation from transformers(RBT3)pre-training approach combined with a text convolutional neural network(TextCNN)was introduced to facilitate the application of the constructed CRDM fault diagnosis graph database for fault query.The RBT3-TextCNN model along with the Jieba tool is proposed for extracting entities and recognizing the fault query intent simultaneously.Experiments on the dataset collected from TMSR-LF1 CRDM fault diagnosis unstructured data demonstrate that this model has the potential to improve the effect of intent recognition and entity extraction.Additionally,a fault alarm monitoring module was developed based on WebSocket protocol to deliver detailed information about the appeared fault to the operator automatically.Furthermore,the Bayesian inference method combined with the variable elimination algorithm was proposed to enable the development of a relatively intelligent and reliable fault diagnosis system.Finally,a CRDM fault diagnosis Web interface integrated with graph data visualization was constructed,making the CRDM fault diagnosis process intuitive and effective.
基金This work was co-funded by the European Research Council for the project ScienceGRAPH(Grant agreement ID:819536)by the TIB Leibniz Information Centre for Science and Technology.
文摘Purpose:This work aims to normalize the NLPCONTRIBUTIONS scheme(henceforward,NLPCONTRIBUTIONGRAPH)to structure,directly from article sentences,the contributions information in Natural Language Processing(NLP)scholarly articles via a two-stage annotation methodology:1)pilot stage-to define the scheme(described in prior work);and 2)adjudication stage-to normalize the graphing model(the focus of this paper).Design/methodology/approach:We re-annotate,a second time,the contributions-pertinent information across 50 prior-annotated NLP scholarly articles in terms of a data pipeline comprising:contribution-centered sentences,phrases,and triple statements.To this end,specifically,care was taken in the adjudication annotation stage to reduce annotation noise while formulating the guidelines for our proposed novel NLP contributions structuring and graphing scheme.Findings:The application of NLPCONTRIBUTIONGRAPH on the 50 articles resulted finally in a dataset of 900 contribution-focused sentences,4,702 contribution-information-centered phrases,and 2,980 surface-structured triples.The intra-annotation agreement between the first and second stages,in terms of F1-score,was 67.92%for sentences,41.82%for phrases,and 22.31%for triple statements indicating that with increased granularity of the information,the annotation decision variance is greater.Research limitations:NLPCONTRIBUTIONGRAPH has limited scope for structuring scholarly contributions compared with STEM(Science,Technology,Engineering,and Medicine)scholarly knowledge at large.Further,the annotation scheme in this work is designed by only an intra-annotator consensus-a single annotator first annotated the data to propose the initial scheme,following which,the same annotator reannotated the data to normalize the annotations in an adjudication stage.However,the expected goal of this work is to achieve a standardized retrospective model of capturing NLP contributions from scholarly articles.This would entail a larger initiative of enlisting multiple annotators to accommodate different worldviews into a“single”set of structures and relationships as the final scheme.Given that the initial scheme is first proposed and the complexity of the annotation task in the realistic timeframe,our intraannotation procedure is well-suited.Nevertheless,the model proposed in this work is presently limited since it does not incorporate multiple annotator worldviews.This is planned as future work to produce a robust model.Practical implications:We demonstrate NLPCONTRIBUTIONGRAPH data integrated into the Open Research Knowledge Graph(ORKG),a next-generation KG-based digital library with intelligent computations enabled over structured scholarly knowledge,as a viable aid to assist researchers in their day-to-day tasks.Originality/value:NLPCONTRIBUTIONGRAPH is a novel scheme to annotate research contributions from NLP articles and integrate them in a knowledge graph,which to the best of our knowledge does not exist in the community.Furthermore,our quantitative evaluations over the two-stage annotation tasks offer insights into task difficulty.
基金Supported by the National Science and Technology Major Project(2016ZX05007-004)。
文摘Based on the well logging knowledge graph of hydrocarbon-bearing formation(HBF),a Knowledge-Powered Neural Network Formation Evaluation model(KPNFE)has been proposed.It has the following functions:(1)extracting characteristic parameters describing HBF in multiple dimensions and multiple scales;(2)showing the characteristic parameter-related entities,relationships,and attributes as vectors via graph embedding technique;(3)intelligently identifying HBF;(4)seamlessly integrating expertise into the intelligent computing to establish the assessment system and ranking algorithm for potential pay recommendation.Taking 547 wells encountered the low porosity and low permeability Chang 6 Member of Triassic in the Jiyuan Block of Ordos Basin,NW China as objects,80%of the wells were randomly selected as the training dataset and the remainder as the validation dataset.The KPNFE prediction results on the validation dataset had a coincidence rate of 94.43%with the expert interpretation results and a coincidence rate of 84.38%for all the oil testing layers,which is 13 percentage points higher in accuracy and over 100 times faster than the primary conventional interpretation.In addition,a number of potential pays likely to produce industrial oil were recommended.The KPNFE model effectively inherits,carries forward and improves the expert knowledge,nicely solving the robustness problem in HBF identification.The KPNFE,with good interpretability and high accuracy of computation results,is a powerful technical means for efficient and high-quality well logging re-evaluation of old wells in mature oilfields.
基金supported by the Science and Technology Project of the State Grid Corporation“Research on Key Technologies of Power Artificial Intelligence Open Platform”(5700-202155260A-0-0-00).
文摘With the construction of new power systems,the power grid has become extremely large,with an increasing proportion of new energy and AC/DC hybrid connections.The dynamic characteristics and fault patterns of the power grid are complex;additionally,power grid control is difficult,operation risks are high,and the task of fault handling is arduous.Traditional power-grid fault handling relies primarily on human experience.The difference in and lack of knowledge reserve of control personnel restrict the accuracy and timeliness of fault handling.Therefore,this mode of operation is no longer suitable for the requirements of new systems.Based on the multi-source heterogeneous data of power grid dispatch,this paper proposes a joint entity–relationship extraction method for power-grid dispatch fault processing based on a pre-trained model,constructs a knowledge graph of power-grid dispatch fault processing and designs,and develops a fault-processing auxiliary decision-making system based on the knowledge graph.It was applied to study a provincial dispatch control center,and it effectively improved the accident processing ability and intelligent level of accident management and control of the power grid.
基金supported by the Sichuan Science and Technology Program under Grants No.2022YFQ0052 and No.2021YFQ0009.
文摘At present,knowledge embedding methods are widely used in the field of knowledge graph(KG)reasoning,and have been successfully applied to those with large entities and relationships.However,in research and production environments,there are a large number of KGs with a small number of entities and relations,which are called sparse KGs.Limited by the performance of knowledge extraction methods or some other reasons(some common-sense information does not appear in the natural corpus),the relation between entities is often incomplete.To solve this problem,a method of the graph neural network and information enhancement is proposed.The improved method increases the mean reciprocal rank(MRR)and Hit@3 by 1.6%and 1.7%,respectively,when the sparsity of the FB15K-237 dataset is 10%.When the sparsity is 50%,the evaluation indexes MRR and Hit@10 are increased by 0.8%and 1.8%,respectively.
基金supported by the Key Laboratory of Environment Controlled Aquaculture(Dalian Ocean University)Ministry of Education(No.2021-MOEKLECA-KF-05)the National Natural Science Foundation of China Youth Science(No.61802046)。
文摘Aquatic medicine knowledge graph is an effective means to realize intelligent aquaculture.Graph completion technology is key to improving the quality of knowledge graph construction.However,the difficulty of semantic discrimination among similar entities and inconspicuous semantic features result in low accuracy when completing aquatic medicine knowledge graph with complex relationships.In this study,an aquatic medicine knowledge graph completion method(TransH+HConvAM)is proposed.Firstly,TransH is applied to split the vector plane between entities and relations,ameliorating the poor completion effect caused by low semantic resolution of entities.Then,hybrid convolution is introduced to obtain the global interaction of triples based on the complete interaction between head/tail entities and relations,which improves the semantic features of triples and enhances the completion effect of complex relationships in the graph.Experiments are conducted to verify the performance of the proposed method.The MR,MRR and Hit@10 of the TransH+HConvAM are found to be 674,0.339,and 0.361,respectively.This study shows that the model effectively overcomes the poor completion effect of complex relationships and improves the construction quality of the aquatic medicine knowledge graph,providing technical support for intelligent aquaculture.
基金This study was funded by the International Science and Technology Cooperation Program of the Science and Technology Department of Shaanxi Province,China(No.2021KW-16)the Science and Technology Project in Xi’an(No.2019218114GXRC017CG018-GXYD17.11),Thesis work was supported by the special fund construction project of Key Disciplines in Ordinary Colleges and Universities in Shaanxi Province,the authors would like to thank the anonymous reviewers for their helpful comments and suggestions.
文摘Text event mining,as an indispensable method of text mining processing,has attracted the extensive attention of researchers.A modeling method for knowledge graph of events based on mutual information among neighbor domains and sparse representation is proposed in this paper,i.e.UKGE-MS.Specifically,UKGE-MS can improve the existing text mining technology's ability of understanding and discovering high-dimensional unmarked information,and solves the problems of traditional unsupervised feature selection methods,which only focus on selecting features from a global perspective and ignoring the impact of local connection of samples.Firstly,considering the influence of local information of samples in feature correlation evaluation,a feature clustering algorithm based on average neighborhood mutual information is proposed,and the feature clusters with certain event correlation are obtained;Secondly,an unsupervised feature selection method based on the high-order correlation of multi-dimensional statistical data is designed by combining the dimension reduction advantage of local linear embedding algorithm and the feature selection ability of sparse representation,so as to enhance the generalization ability of the selected feature items.Finally,the events knowledge graph is constructed by means of sparse representation and l1 norm.Extensive experiments are carried out on five real datasets and synthetic datasets,and the UKGE-MS are compared with five corresponding algorithms.The experimental results show that UKGE-MS is better than the traditional method in event clustering and feature selection,and has some advantages over other methods in text event recognition and discovery.
文摘钻井顶部驱动装置结构复杂、故障类型多样,现有的故障树分析法和专家系统难以有效应对复杂多变的现场情况。为此,利用知识图谱在结构化与非结构化信息融合、故障模式关联分析以及先验知识传递方面的优势,提出了一种基于知识图谱的钻井顶部驱动装置故障诊断方法,利用以Transformer为基础的双向编码器模型(Bidirectional Encoder Representations from Transformers,BERT)构建了混合神经网络模型BERT-BiLSTM-CRF与BERT-BiLSTM-Attention,分别实现了顶驱故障文本数据的命名实体识别和关系抽取,并通过相似度计算,实现了故障知识的有效融合和智能问答,最终构建了顶部驱动装置故障诊断方法。研究结果表明:①在故障实体识别任务上,BERT-BiLSTM-CRF模型的精确度达到95.49%,能够有效识别故障文本中的信息实体;②在故障关系抽取上,BERT-BiLSTM-Attention模型的精确度达到93.61%,实现了知识图谱关系边的正确建立;③开发的问答系统实现了知识图谱的智能应用,其在多个不同类型问题上的回答准确率超过了90%,能够满足现场使用需求。结论认为,基于知识图谱的故障诊断方法能够有效利用顶部驱动装置的先验知识,实现故障的快速定位与智能诊断,具备良好的应用前景。