期刊文献+

基于深度学习模型的文化景观基因命名实体识别

Named Entity Recognition for Cultural Genes in Traditional Settlements Using a Deep Learning Model
原文传递
导出
摘要 【目的】针对当前传统聚落研究缺乏结合自然语言处理的原理与方法从海量文本数据中提取文化景观基因信息的相关探索,本文提出景观基因命名实体(Traditional Settlement Landscape Genes Named Entity,TSLGNE)的概念,并以邵阳市48个传统村落为案例,采用BERT-BiLSTM-CRF深度学习模型开展TSLGNE的识别研究。【方法】首先,结合地理实体特征与文化景观基因理论及其分类体系,提出TSLGNE的概念、分类体系与知识表达方法。其次,基于TSLGNE的分类体系与扩展BIOES标注方法,对研究案例源文本数据进行语料标注,构建了相应的语料库。随后,基于BERT-BiLSTM-CRF深度学习模型识别并提取语料数据中的TSLGNE信息。最后,通过Neo4j图数据库对获取到的TSLGNE知识进行组织与储存,并基于此对区域传统聚落及其TSLGNE进行空间特征分析。【结果】本文模型能够有效识别文本数据中的建筑、环境、文化等12类景观基因实体信息,总体精确率、召回率、F1值相较对比模型均有提升。与BiLSTM-CRF、BERT-CRF模型相比,本模型的F1值分别提高11%与1%。特别地,本模型对于语料数据质量差且语义复杂的实体的识别效果有极大提升,如文化基因C3类实体识别的F1较对比模型分别提高了31%与5%。【结论】此方法能够有效地对区域复杂的传统聚落文化基因特征及其关系进行空间分析处理,对今后结合GIS和数据挖掘方法分析传统聚落的重要文化特征及传统聚落知识服务等具有较好的借鉴意义。 [Objectives]Traditional settlements contain rich geographical,cultural,and historical information,making them an essential component of cultural heritage.The urgent need to protect these resources highlights the importance of their preservation.Research in traditional settlements has generated vast,multimodal,and heterogeneous data resources.However,much of the textual information remains unstructured,limiting its potential for in-depth analysis and the exploration of embedded landscape gene information.There is currently a lack of principles and methods that combine data mining and natural language processing to extract cultural landscape genes information from extensive textual data on traditional settlements.This study introduces the concept of Traditional Settlement Landscape Genes Named Entity(TSLGNE)and applies it in recognition experiments using 48 traditional villages in Shaoyang,supported by the BERT-BiLSTM-CRF deep learning model.[Methods]First,the study explores the connotation,classification system,and knowledge representation of TSLGN by combining geographical entity characteristics with cultural landscape gene theory.Second,based on the TSLGNE classification system and an extended BIOES annotation method,the source text data from the study area is annotated to construct a corresponding corpus.Subsequently,the BERT-BiLSTM-CRF model is utilized for TSLGNE identification and extraction.Finally,the obtained TSLGNE knowledge is organized and stored using a Neo4j graph database,enabling spatial feature analysis of traditional settlements and their associated TSLGNEs.[Results]The model achieves an overall F1-score of 64%for TSLGNE recognition,outperforming the BiLSTM-CRF and BERT-CRF models by 11%and 1%,respectively.Notably,the model significantly enhances recognition performance for entities with low-quality and semantically complex data,with the F1-score for cultural gene category C3 increasing by 31%and 5%,respectively,compared to the baseline models.[Conclusions]The proposed model efficiently extracts TSLGNE information such as architecture,environment,and culture from large-scale text.Additionally,it effectively analyzes the spatial characteristics and relationships of cultural genes within traditional settlements in complex regions.This study offers valuable insights into traditional Chinese settlements,combining GIS and spatial data mining methods to advance research on their key cultural characteristics.
作者 林洁如 胡最 LIN Jieru;HU Zui(College of Geography and Tourism,Hengyang Normal University,Hengyang 421002,China;National and Local Joint Engineering Institution for Digital Preservation and Creative Use Technologies of Traditional Villages and Towns,Hengyang 421002,China)
出处 《地球信息科学学报》 北大核心 2025年第1期207-225,共19页 Journal of Geo-information Science
基金 国家自然科学基金项目(41771188) 湖南省自然资源厅后补助项目(HBS20240101)。
关键词 传统聚落 文化景观基因 自然语言处理 命名实体识别 BERT预训练语言模型 Neo4j图数据库 知识图谱 traditional settlements cultural landscape genes natural language processing named entity recognition BERT Neo4j knowledge graph
作者简介 林洁如(1997-),女,湖南邵阳人,硕士生,从事文化景观基因文本数据挖掘与GIS应用研究。E-mail:jierulin0528@163.com;通讯作者:胡最(1977-),男,湖南宁乡人,博士,教授,主要从事GIS原理与传统聚落文化景观基因计算理论研究。E-mail:fuyanghuzui@163.com。
  • 相关文献

参考文献45

二级参考文献674

共引文献1730

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部