摘要
在西文OCR中,从候选结果中挑选最佳结果的后处理操作是必不可少的,并且利用单词拼写检查进行后处理是完全可行的。但是,以往的方法分别在不同程度上具有低可靠性和局限性。为此,该文提出将有限自动机模型应用于西文OCR后处理中,该方法有效地将拼写检查和识别结果信息结合起来,克服了以往方法中存在的低可靠性和局限性,并通过实验验证了该方法的有效性。以识别后处理辅助识别,错误率从0.79%降到0.59%;以识别后处理和系统后处理结合辅助识别,错误率降低到0.55%。
In western language OCR systems ,the post-processing of selecting the best result from some candidates is absolutely necessary.Spell-check can provide reliable information for this task.However,there are some limitations in previous methods in different extents.In this paper,the finite automaton model is applied to the post -processing procedure.It combines the spell -check with the character recognition results.Experiment shows the validity of the method.Using the recognition post-processing,the error rate drops to0.59%from0.79%.Using both the recognition and the system post-processing,the error rate drops to0.55%.
出处
《计算机工程与应用》
CSCD
北大核心
2004年第23期26-29,共4页
Computer Engineering and Applications
基金
国家自然科学基金天元基金项目(编号:TY10026002-04-04-01)资助