Traditional image segmentation methods based on MRF converge slowly and require pre-defined weight. These disadvantages are addressed, and a fast segmentation approach based on simple Markov random field (MRF) for S...Traditional image segmentation methods based on MRF converge slowly and require pre-defined weight. These disadvantages are addressed, and a fast segmentation approach based on simple Markov random field (MRF) for SAR image is proposed. The approach is firstly used to perform coarse segmentation in blocks. Then the image is modeled with simple MRF and adaptive variable weighting forms are applied in homogeneous and heterogeneous regions. As a result, the convergent speed is accelerated while the segmentation results in homogeneous regions and boarders are improved. Simulations with synthetic and real SAR images demonstrate the effectiveness of the proposed approach.展开更多
为解决文本类地铁应急处置流程存在的流程顺序关系不明确、流程执行人员模糊等问题,提出了基于BiLSTM-CRF(Bidirectional Long Short-Term Memory-Conditional Random Field)的地铁应急处置知识抽取与推理方法。首先,利用BiLSTM-CRF方...为解决文本类地铁应急处置流程存在的流程顺序关系不明确、流程执行人员模糊等问题,提出了基于BiLSTM-CRF(Bidirectional Long Short-Term Memory-Conditional Random Field)的地铁应急处置知识抽取与推理方法。首先,利用BiLSTM-CRF方法对地铁应急处置流程的文本资料进行命名实体识别,完成文本资料的知识抽取;其次,选用TransD模型对识别后实体数据进行知识推理,从而完成以实体和属性对为节点、关系对为边的知识图谱构建;最后,利用Neo4j图数据库对构建的地铁应急处置流程知识图谱进行了可视化展示和案例分析。研究结果表明,基于BiLSTM-CRF的知识抽取模型的精确率、召回率和F1值均达到了90%以上,且基于BiLSTM-CRF的TransD模型的推理结果准确率提升了22.92%,保证了知识图谱构建的准确性,可为地铁应急管理提供决策支持。展开更多
为了实现在城市内涝舆情信息中快速、精准地识别相关风险要素,首先基于新浪微博平台,对用户评论信息及媒体发布信息进行采集、整理及标注,构建了城市内涝灾害事件语料数据集。进而针对城市内涝舆情信息格式不统一、语义复杂且风险要素...为了实现在城市内涝舆情信息中快速、精准地识别相关风险要素,首先基于新浪微博平台,对用户评论信息及媒体发布信息进行采集、整理及标注,构建了城市内涝灾害事件语料数据集。进而针对城市内涝舆情信息格式不统一、语义复杂且风险要素识别的专业性、精准度要求较高等问题,结合自然灾害系统理论的风险要素框架,提出了一种基于双向编码器表征法-双向长短期记忆-条件随机场(Bidirectional Encoder Representations from Transformers-Bidirectional Long Short-Term Memory-Conditional Random Field,BERT-BiLSTM-CRF)的识别方法,并开展了一系列模型验证试验。对比试验结果表明,该模型在准确率、召回率、F_(1)三项指标上均有较好表现,其中准确率为84.62%,召回率为86.19%,F_(1)为85.35%,优于其他对比模型。消融试验结果表明,BERT预训练模型对于该模型性能有着更为显著的影响。综合上述试验结果,可以验证该模型能够有效识别城市内涝舆情信息中的各类风险要素,进而为城市内涝灾害风险管控的数智化转型提供研究依据。展开更多
Architecture framework has become an effective method recently to describe the system of systems(SoS)architecture,such as the United States(US)Department of Defense Architecture Framework Version 2.0(DoDAF2.0).As a vi...Architecture framework has become an effective method recently to describe the system of systems(SoS)architecture,such as the United States(US)Department of Defense Architecture Framework Version 2.0(DoDAF2.0).As a viewpoint in DoDAF2.0,the operational viewpoint(OV)describes operational activities,nodes,and resource flows.The OV models are important for SoS architecture development.However,as the SoS complexity increases,constructing OV models with traditional methods exposes shortcomings,such as inefficient data collection and low modeling standards.Therefore,we propose an intelligent modeling method for five OV models,including operational resource flow OV-2,organizational relationships OV-4,operational activity hierarchy OV-5a,operational activities model OV-5b,and operational activity sequences OV-6c.The main idea of the method is to extract OV architecture data from text and generate interoperable OV models.First,we construct the OV meta model based on the DoDAF2.0 meta model(DM2).Second,OV architecture named entities is recognized from text based on the bidirectional long short-term memory and conditional random field(BiLSTM-CRF)model.And OV architecture relationships are collected with relationship extraction rules.Finally,we define the generation rules for OV models and develop an OV modeling tool.We use unmanned surface vehicles(USV)swarm target defense SoS architecture as a case to verify the feasibility and effectiveness of the intelligent modeling method.展开更多
针对畜禽疫病文本语料匮乏、文本内包含大量疫病名称及短语等未登录词问题,提出了一种结合词典匹配的BERT-BiLSTM-CRF畜禽疫病文本分词模型。以羊疫病为研究对象,构建了常见疫病文本数据集,将其与通用语料PKU结合,利用BERT(Bidirectiona...针对畜禽疫病文本语料匮乏、文本内包含大量疫病名称及短语等未登录词问题,提出了一种结合词典匹配的BERT-BiLSTM-CRF畜禽疫病文本分词模型。以羊疫病为研究对象,构建了常见疫病文本数据集,将其与通用语料PKU结合,利用BERT(Bidirectional encoder representation from transformers)预训练语言模型进行文本向量化表示;通过双向长短时记忆网络(Bidirectional long short-term memory network,BiLSTM)获取上下文语义特征;由条件随机场(Conditional random field,CRF)输出全局最优标签序列。基于此,在CRF层后加入畜禽疫病领域词典进行分词匹配修正,减少在分词过程中出现的疫病名称及短语等造成的歧义切分,进一步提高了分词准确率。实验结果表明,结合词典匹配的BERT-BiLSTM-CRF模型在羊常见疫病文本数据集上的F1值为96.38%,与jieba分词器、BiLSTM-Softmax模型、BiLSTM-CRF模型、未结合词典匹配的本文模型相比,分别提升11.01、10.62、8.3、0.72个百分点,验证了方法的有效性。与单一语料相比,通用语料PKU和羊常见疫病文本数据集结合的混合语料,能够同时对畜禽疫病专业术语及疫病文本中常用词进行准确切分,在通用语料及疫病文本数据集上F1值都达到95%以上,具有较好的模型泛化能力。该方法可用于畜禽疫病文本分词。展开更多
针对命名实体识别(NER)任务中相关模型通常仅对字符及相关词汇进行建模,未充分利用汉字特有的字形结构信息和实体类型信息的问题,提出一种融合先验知识和字形特征的命名实体识别模型。首先,采用结合高斯注意力机制的Transformer对输入...针对命名实体识别(NER)任务中相关模型通常仅对字符及相关词汇进行建模,未充分利用汉字特有的字形结构信息和实体类型信息的问题,提出一种融合先验知识和字形特征的命名实体识别模型。首先,采用结合高斯注意力机制的Transformer对输入序列进行编码,并从中文维基百科中获取实体类型的中文释义,采用双向门控循环单元(BiGRU)编码实体类型信息作为先验知识,利用注意力机制将它与字符表示进行组合;其次,采用双向长短时记忆(BiLSTM)网络编码输入序列的远距离依赖关系,通过字形编码表获得繁体的仓颉码和简体的现代五笔码,采用卷积神经网络(CNN)提取字形特征表示,并根据不同权重组合繁体与简体字形特征,利用门控机制将它与经过BiLSTM编码后的字符表示进行组合;最后,使用条件随机场(CRF)解码,得到命名实体标注序列。在偏口语化的数据集Weibo、小型数据集Boson和大型数据集PeopleDaily上的实验结果表明,与基线模型MECT(Multi-metadata Embedding based Cross-Transformer)相比,所提模型的F1值别提高了2.47、1.20和0.98个百分点,验证了模型的有效性。展开更多
基金supported by the Specialized Research Found for the Doctoral Program of Higher Education (20070699013)the Natural Science Foundation of Shaanxi Province (2006F05)the Aeronautical Science Foundation (05I53076)
文摘Traditional image segmentation methods based on MRF converge slowly and require pre-defined weight. These disadvantages are addressed, and a fast segmentation approach based on simple Markov random field (MRF) for SAR image is proposed. The approach is firstly used to perform coarse segmentation in blocks. Then the image is modeled with simple MRF and adaptive variable weighting forms are applied in homogeneous and heterogeneous regions. As a result, the convergent speed is accelerated while the segmentation results in homogeneous regions and boarders are improved. Simulations with synthetic and real SAR images demonstrate the effectiveness of the proposed approach.
文摘该研究致力于构建一个高质量的数据集,用于南美白对虾养殖领域的命名实体识别(named entity recognition,NER)任务,命名为VamNER。为确保数据集的多样性,从CNKI数据库中收集了近10年的高质量论文,并结合权威书籍进行语料构建。邀请专家讨论实体类型,并经过专业培训的标注人员使用IOB2标注格式进行标注,标注过程分为预标注和正式标注两个阶段以提高效率。在预标注阶段,标注者间一致性(inter-annotation agreement,IAA)达到0.87,表明标注人员的一致性较高。最终,VamNER包含6115个句子,总字符数达384602,涵盖10个实体类型,共有12814个实体。研究通过与多个通用领域数据集和一个特定领域数据集进行比较,揭示了VamNER的独特特性。在实验中使用了预训练的基于变换器的双向编码器表示(bidirectional encoder representations from Transformers,BERT)模型、双向长短期记忆神经网络(bidirectional long short-term memory network,BiLSTM)和条件随机场模型(conditional random fields,CRF),最优模型在测试集上的F1值达到82.8%。VamNER成为首个专注于南美白对虾养殖领域的NER数据集,为中文特定领域NER研究提供了丰富资源,有望推动水产养殖领域NER研究的发展。
文摘为解决文本类地铁应急处置流程存在的流程顺序关系不明确、流程执行人员模糊等问题,提出了基于BiLSTM-CRF(Bidirectional Long Short-Term Memory-Conditional Random Field)的地铁应急处置知识抽取与推理方法。首先,利用BiLSTM-CRF方法对地铁应急处置流程的文本资料进行命名实体识别,完成文本资料的知识抽取;其次,选用TransD模型对识别后实体数据进行知识推理,从而完成以实体和属性对为节点、关系对为边的知识图谱构建;最后,利用Neo4j图数据库对构建的地铁应急处置流程知识图谱进行了可视化展示和案例分析。研究结果表明,基于BiLSTM-CRF的知识抽取模型的精确率、召回率和F1值均达到了90%以上,且基于BiLSTM-CRF的TransD模型的推理结果准确率提升了22.92%,保证了知识图谱构建的准确性,可为地铁应急管理提供决策支持。
文摘为了实现在城市内涝舆情信息中快速、精准地识别相关风险要素,首先基于新浪微博平台,对用户评论信息及媒体发布信息进行采集、整理及标注,构建了城市内涝灾害事件语料数据集。进而针对城市内涝舆情信息格式不统一、语义复杂且风险要素识别的专业性、精准度要求较高等问题,结合自然灾害系统理论的风险要素框架,提出了一种基于双向编码器表征法-双向长短期记忆-条件随机场(Bidirectional Encoder Representations from Transformers-Bidirectional Long Short-Term Memory-Conditional Random Field,BERT-BiLSTM-CRF)的识别方法,并开展了一系列模型验证试验。对比试验结果表明,该模型在准确率、召回率、F_(1)三项指标上均有较好表现,其中准确率为84.62%,召回率为86.19%,F_(1)为85.35%,优于其他对比模型。消融试验结果表明,BERT预训练模型对于该模型性能有着更为显著的影响。综合上述试验结果,可以验证该模型能够有效识别城市内涝舆情信息中的各类风险要素,进而为城市内涝灾害风险管控的数智化转型提供研究依据。
基金National Natural Science Foundation of China(71690233,71971213,71901214)。
文摘Architecture framework has become an effective method recently to describe the system of systems(SoS)architecture,such as the United States(US)Department of Defense Architecture Framework Version 2.0(DoDAF2.0).As a viewpoint in DoDAF2.0,the operational viewpoint(OV)describes operational activities,nodes,and resource flows.The OV models are important for SoS architecture development.However,as the SoS complexity increases,constructing OV models with traditional methods exposes shortcomings,such as inefficient data collection and low modeling standards.Therefore,we propose an intelligent modeling method for five OV models,including operational resource flow OV-2,organizational relationships OV-4,operational activity hierarchy OV-5a,operational activities model OV-5b,and operational activity sequences OV-6c.The main idea of the method is to extract OV architecture data from text and generate interoperable OV models.First,we construct the OV meta model based on the DoDAF2.0 meta model(DM2).Second,OV architecture named entities is recognized from text based on the bidirectional long short-term memory and conditional random field(BiLSTM-CRF)model.And OV architecture relationships are collected with relationship extraction rules.Finally,we define the generation rules for OV models and develop an OV modeling tool.We use unmanned surface vehicles(USV)swarm target defense SoS architecture as a case to verify the feasibility and effectiveness of the intelligent modeling method.
文摘针对畜禽疫病文本语料匮乏、文本内包含大量疫病名称及短语等未登录词问题,提出了一种结合词典匹配的BERT-BiLSTM-CRF畜禽疫病文本分词模型。以羊疫病为研究对象,构建了常见疫病文本数据集,将其与通用语料PKU结合,利用BERT(Bidirectional encoder representation from transformers)预训练语言模型进行文本向量化表示;通过双向长短时记忆网络(Bidirectional long short-term memory network,BiLSTM)获取上下文语义特征;由条件随机场(Conditional random field,CRF)输出全局最优标签序列。基于此,在CRF层后加入畜禽疫病领域词典进行分词匹配修正,减少在分词过程中出现的疫病名称及短语等造成的歧义切分,进一步提高了分词准确率。实验结果表明,结合词典匹配的BERT-BiLSTM-CRF模型在羊常见疫病文本数据集上的F1值为96.38%,与jieba分词器、BiLSTM-Softmax模型、BiLSTM-CRF模型、未结合词典匹配的本文模型相比,分别提升11.01、10.62、8.3、0.72个百分点,验证了方法的有效性。与单一语料相比,通用语料PKU和羊常见疫病文本数据集结合的混合语料,能够同时对畜禽疫病专业术语及疫病文本中常用词进行准确切分,在通用语料及疫病文本数据集上F1值都达到95%以上,具有较好的模型泛化能力。该方法可用于畜禽疫病文本分词。
文摘针对命名实体识别(NER)任务中相关模型通常仅对字符及相关词汇进行建模,未充分利用汉字特有的字形结构信息和实体类型信息的问题,提出一种融合先验知识和字形特征的命名实体识别模型。首先,采用结合高斯注意力机制的Transformer对输入序列进行编码,并从中文维基百科中获取实体类型的中文释义,采用双向门控循环单元(BiGRU)编码实体类型信息作为先验知识,利用注意力机制将它与字符表示进行组合;其次,采用双向长短时记忆(BiLSTM)网络编码输入序列的远距离依赖关系,通过字形编码表获得繁体的仓颉码和简体的现代五笔码,采用卷积神经网络(CNN)提取字形特征表示,并根据不同权重组合繁体与简体字形特征,利用门控机制将它与经过BiLSTM编码后的字符表示进行组合;最后,使用条件随机场(CRF)解码,得到命名实体标注序列。在偏口语化的数据集Weibo、小型数据集Boson和大型数据集PeopleDaily上的实验结果表明,与基线模型MECT(Multi-metadata Embedding based Cross-Transformer)相比,所提模型的F1值别提高了2.47、1.20和0.98个百分点,验证了模型的有效性。