This paper describes the experiments with Korean-to-Vietnamese statistical machine translation(SMT). The fact that Korean is a morphologically complex language that does not have clear optimal word boundaries causes a...This paper describes the experiments with Korean-to-Vietnamese statistical machine translation(SMT). The fact that Korean is a morphologically complex language that does not have clear optimal word boundaries causes a major problem of translating into or from Korean. To solve this problem, we present a method to conduct a Korean morphological analysis by using a pre-analyzed partial word-phrase dictionary(PWD).Besides, we build a Korean-Vietnamese parallel corpus for training SMT models by collecting text from multilingual magazines. Then, we apply such a morphology analysis to Korean sentences that are included in the collected parallel corpus as a preprocessing step. The experiment results demonstrate a remarkable improvement of Korean-to-Vietnamese translation quality in term of bi-lingual evaluation understudy(BLEU).展开更多
平行因子(Parallel Factor,PARAFAC)作为一种张量数据处理算法,在宽松约束条件下其模型分解具有唯一性。本文将局域均值分解(Local mean decomposition,LMD)和PARAFAC相结合,提出一种新的欠定盲源分离算法。利用局域均值分解得到观测信...平行因子(Parallel Factor,PARAFAC)作为一种张量数据处理算法,在宽松约束条件下其模型分解具有唯一性。本文将局域均值分解(Local mean decomposition,LMD)和PARAFAC相结合,提出一种新的欠定盲源分离算法。利用局域均值分解得到观测信号的生产函数(Production functions,PF)分量,再与原观测信号组合得到新的观测信号,从而将欠定混合转换为额定或超定混合源分离问题。对新观测信号进行白化预处理并构造为PARAFAC模型,并利用三线性交替最小二乘(Trilinear Alternating Least Square,TALS)算法实现PARAFAC模型分解,从而得到源信号的估计。通过仿真结果表明LMDPARAFAC算法能够从非平稳欠定混合信号中准确估计源信号。将所提算法应用到多机振动源实验中,实验结果进一步验证了该算法的有效性。展开更多
基金supported by the Institute for Information&communications Technology Promotion under Grant No.R0101-16-0176the Project of Core Technology Development for Human-Like Self-Taught Learning Based on Symbolic Approach
文摘This paper describes the experiments with Korean-to-Vietnamese statistical machine translation(SMT). The fact that Korean is a morphologically complex language that does not have clear optimal word boundaries causes a major problem of translating into or from Korean. To solve this problem, we present a method to conduct a Korean morphological analysis by using a pre-analyzed partial word-phrase dictionary(PWD).Besides, we build a Korean-Vietnamese parallel corpus for training SMT models by collecting text from multilingual magazines. Then, we apply such a morphology analysis to Korean sentences that are included in the collected parallel corpus as a preprocessing step. The experiment results demonstrate a remarkable improvement of Korean-to-Vietnamese translation quality in term of bi-lingual evaluation understudy(BLEU).
文摘平行因子(Parallel Factor,PARAFAC)作为一种张量数据处理算法,在宽松约束条件下其模型分解具有唯一性。本文将局域均值分解(Local mean decomposition,LMD)和PARAFAC相结合,提出一种新的欠定盲源分离算法。利用局域均值分解得到观测信号的生产函数(Production functions,PF)分量,再与原观测信号组合得到新的观测信号,从而将欠定混合转换为额定或超定混合源分离问题。对新观测信号进行白化预处理并构造为PARAFAC模型,并利用三线性交替最小二乘(Trilinear Alternating Least Square,TALS)算法实现PARAFAC模型分解,从而得到源信号的估计。通过仿真结果表明LMDPARAFAC算法能够从非平稳欠定混合信号中准确估计源信号。将所提算法应用到多机振动源实验中,实验结果进一步验证了该算法的有效性。