摘要
近年来,将语法错误纠正当作机器翻译任务在英语语法纠错领域取得重大进展,对于数据驱动的自然语言处理方法,大规模、高质量的标注语料成为翻译等相关任务最重要的资源。在调查中,主要关注英语语法纠错领域的数据集和数据增广方法。全面地概括了英语语法纠错领域使用的数据集、数据合成、评价方法及应用现状,并对其进行归纳分析;对今后如何提高英语语法纠错模型的性能进行了总结和展望。
In recent years, the grammatical error correction as machine translation task has made significant progress in the field of English grammar correction, for data-driven natural language processing methods, large-scale, high-quality annotated data have become the most important task of translation and other related resources. In this survey, it focuses on the field of English grammar correction data sets and data augmented methods. This paper comprehensively summarizes the data sets, data synthesis, evaluation methods and application status in the field of English grammar error correction,and conducts an inductive analysis on them. Finally, the paper summarizes and prospects how to improve the performance of English grammar error correction model in the future.
作者
孙晓东
杨东强
SUN Xiaodong;YANG Dongqiang(School of Computer Science and Technology,Shandong Jianzhu University,Jinan 250101,China)
出处
《计算机工程与应用》
CSCD
北大核心
2022年第7期43-54,共12页
Computer Engineering and Applications
基金
教育部人文社会科学研究规划基金(15YJA740054)。
关键词
数据驱动
数据增广
英语语法纠错
data-driven
data augmentation
English grammar error correction
作者简介
孙晓东(1995-),男,硕士研究生,主要研究方向为自然语言处理;通信作者:杨东强(1970-),男,博士,副教授,主要研究方向为自然语言处理,E-mail:ydq@sdjzu.edu.cn。