摘要
                
                    本文针对马铃薯病虫害实体标注数据集缺失的问题,提出了基于大语言模型(LLM)的方法构建马铃薯中文病虫害实体标注数据集,有效减少了标注过程中的人工成本和时间成本;同时,针对中文马铃薯病虫害文本实体嵌套问题,构建了一套基于Ro BERTa_wwm-CNN-BiGRU-Biaffine的命名实体识别模型。首先,采用RoBERTa-wwm模型提取马铃薯病虫害文本的语义信息,获取动态词向量,解决词不完全识别的问题;然后,将CNN-BiGRU作为特征抽器获取实体的上下文信息;再基于双仿射机制识别实体的跨度信息;最后,通过softmax函数进行解码。引入焦点损失函数解决样本分布不均衡的问题。实验表明,该模型的马铃薯病虫害文本命名实体识别P、R、F_(1)值分别达到91.50%、90.28%、90.89%。本文提出的模型在公有数据集MSRA上与RoBERTa_wwm、RoBERTa_wwm_LSTM_Biaffine、RoBERTa_wwm_CNN_Biaffine等模型进行了对比实验,并取得了最佳的实验结果,F_(1)值达到96.67%。
                
                This paper addressed the issue of lacking annotated datasets for potato pest and disease entities by proposing a method based on a large language model(LLM)to construct a Chinese annotated datasets for these entities.This approach effectively reduced the labor and time costs associated with the annotation process.To tackle the problem of nested entities,a named entity recognition(NER)model based on RoBERTa_wwm-CNN-BiGRU-Biaffine was developed.First,the RoBERTa-wwm model was used to extract semantic information from potato pest and disease texts,and obtain dynamic word vectors,addressing the issue of incomplete word recognition.Then,the CNN-BiCRU was employed as a feature extractor to capture the contextual information of entities,followed by the identification of entity span information based on the biaffine mechanism.Finally,decoding was performed using the softmax function.To address the challenge of imbalanced sample distribution,the Focal Loss function was introduced.Experiments showed that the model achieved precision,recall,and F_(1)values of 91.50%,90.28%,and 90.89%,respectively,for named entity recognition of potato pests and diseases.Furthermore,the proposed model was compared with RoBERTa_wwm,RoBERTa_wwm_LSTM_Biaffine,and RoBERTa_wwm_CNN_Biaffine models on the MSRA dataset,achieving the best experimental result withan F_(1)valueof96.67%.
    
    
                作者
                    谢聪娇
                    高静
                    陈俊杰
                XIE Congjiao;GAO Jing;CHEN Junjie(College of Computer and Information Engineering,Inner Mongolia Agricultural University,Hohhot O1001l,China;Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application for Agriculture and Animal Husbandry,Hohhot O010011,China;Inner Mongolia Autonomous Region Government Service and Data Management Bureau,Hohhot 010010,China)
     
    
    
                出处
                
                    《内蒙古农业大学学报(自然科学版)》
                        
                                北大核心
                        
                    
                        2025年第4期74-83,共10页
                    
                
                    Journal of Inner Mongolia Agricultural University(Natural Science Edition)
     
            
                基金
                    内蒙古自治区科技重大专项项目(2021ZD0005)
                    内蒙古自治区自然科学基金项目(2020MS06013)。
            
    
                关键词
                    马铃薯病虫害
                    命名实体识别
                    双仿射机制
                
                        Potato pests and diseases
                        Named entity recognition
                        Biaffine mechanism
                
     
    
    
                作者简介
谢聪娇(1990-),女,博士研究生,讲师,主要从事农业信息技术、自然语言处理方面的研究,E-mail:csxcj@imau.edu.cn;通信作者:高静(1970-),满族,博士,教授,主要从事大数据智能与知识发现方面的研究,E-mail:gaojing@imau.edu.cn。