A simplification-translation-restoration framework for cross-domain SMT applications
Journal
24th International Conference on Computational Linguistics
Pages
545-560
Date Issued
2012
Author(s)
Abstract
Integration of domain specific knowledge into a general purpose statistical machine translation (SMT) system poses challenges due to insufficient bilingual corpora. In this paper we propose a simplification-translation-restoration (STR) framework for domain adaptation in SMT by simplifying domain specific segments of a text. For an in-domain text, we identify the critical segments and modify them to alleviate the data sparseness problem in the out-domain SMT system. After we receive the translation result, these critical segments are then restored according to the provided in-domain knowledge. We conduct experiments on an English-to- Chinese translation task in the medical domain and evaluate each step of the STR framework. The translation results show significant improvement of our approach over the out-domain and the na?ve in-domain SMT systems. ? 2012 The COLING.
Subjects
Cross-domain SMT
Domain adaptation
Statistical machine translation
Type
conference paper
