期刊文献+

维吾尔语不同词尾粒度对维汉词对齐的影响 被引量:2

Effect of different Uyghur suffixes granularity on Uyghur-Chinese word alignment
在线阅读 下载PDF
导出
摘要 维吾尔语中,词的复杂形态是导致数据稀疏问题的主要原因,为降低数据稀疏对词对齐和机器翻译的不良影响,尽可能挖掘词尾携带的语义信息,提出对词尾采取"分离-丢弃"方案。根据统计分析,对维吾尔语词进行词干、词尾分离后,对其语义信息被明文翻译概率高的词尾采取"分离"方案,概率低的词尾采取"丢弃"方案。将该方案应用到维吾尔语名词和动词上,分等级构造9种模板进行实验,实验结果表明,该方案抑制了词干、词尾分离带来的句子长度过长问题,增加了维汉词对的数量,提高了维汉机器翻译质量,验证了该方案的有效性。 The main reason leads to data sparseness is rich morphological forms of words in Uyghur. To reduce the negative effects of data sparseness on Uyghur-Chinese word alignment and machine translation, a separating-dropping method was presen- ted. According to the statistical analysis, the affixes with highly translated probability were separated from stem and the affixes with lower translated probability were dropped. This method was applied to two main word classes including noun and verb in Uyghur, and 9 models were constructed for experiments. Results of experiments show the proposed method controls the length of the sentence caused by separating stem and affixes, the number of Uyghur-Chinese word pairs is increased, the quality of Uy- ghur-Chinese machine translation is improved, and the efficiency of this method is verified.
出处 《计算机工程与设计》 北大核心 2015年第8期2297-2302,共6页 Computer Engineering and Design
基金 国家自然科学基金项目(61262061) 自治区科技计划基金项目(201423120)
关键词 词对齐 维汉机器翻译 维汉词对齐 词尾粒度 形态分析 word alignment Uyghur-Chinese machine translation Uyghur-Chinese word alignment affixes granularity mor-phological analysis
作者简介 麦合甫热提(1976-),女(维吾尔族),新疆塔城人,硕士,讲师,研究方向为自然语言处理、机器翻译; 麦热哈巴·艾力(1973-),女(维吾尔族),新疆乌鲁木齐人,博士,副教授,CCF会员,研究方向为自然语言处理、机器翻译; 米莉万·雪合来提(1984-),女(维吾尔族),新疆乌鲁木齐人,博士研究生,研究方向为机器翻译。E-mail:xmahpu76@163.com
  • 相关文献

参考文献11

  • 1Wang Z, Lu Y, Liu Q. Multi-granularity word alignment and decoding for agglutinative language translation [C] //Procee- dings of MT SUMMIT, 2011. 360-367.
  • 2Luong Minh-Tang, Preslav Nakov, Kan Min-Yen. A hybrid morpheme-word representation for machine translation for mor- phologically rich languages [C] //Proceedings of the Confen- rence on Empirical Methods of Natural Language Processing, 2010: 148-157.
  • 3Bisazza A, Klasinas I, Cettolo M, et al. FBK@IWLST 2010 [C] //Proc of the 7th International WorkShop on Spoken Lan-guageTranslation, 2010: 53-58.
  • 4麦热哈巴·艾力,王志洋,吐尔根·依布拉音.一种提高维吾尔语-汉语词语对齐的方法研究[J].小型微型计算机系统,2012,33(11):2551-2555. 被引量:9
  • 5Papineni K, Roukos S, Ward T, et al. BLEU: A method for automatic evaluation of machine translation[C] //Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2002:311 -318.
  • 6Stolcke A. SRILM-an extensible language modeling toolkit [C] //Proceedings of the International Conference on Spoken Language Processing, 2002: 901-904.
  • 7Yeniterzi Reyyan, Kemal O{lazer. Syntax to-morphology map- ping in factored phrase-based statistical machine translation from English to Turkish [C] //Meeting of the Association for Corn-putational Linguistics, 2010: 454-464.
  • 8El-kahlout, llknur Durgar, Kemal Oflazer. Exploiting morphology and local word reordering in English-to-Turkish phrase-based statis- tical machine translation [J]. Audio Speech language Processing IEEETransactions, 2010, 18 (6): 1313-1322.
  • 9Ahmed E1 Kholy, Nizar Habash. Orthographic and morpho- logical processing for English - Arabic statistical machine trans- lation [J]. Machine translation, 2012, 26 (1-2). 25-45.
  • 10Sara Stymne, Nicola Cancedda. Productive generation o{ com- pound words in statistical machine translation [C] //Procee- dings of the Sixth Workshop on Statistical Machine Translation, 2011: 250 260.

二级参考文献7

共引文献26

同被引文献53

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部