Chen H.-B.Huang H.-H.Chen H.-H.Tan C.-T.2019-07-102019-07-102012https://scholars.lib.ntu.edu.tw/handle/123456789/413140Integration of domain specific knowledge into a general purpose statistical machine translation (SMT) system poses challenges due to insufficient bilingual corpora. In this paper we propose a simplification-translation-restoration (STR) framework for domain adaptation in SMT by simplifying domain specific segments of a text. For an in-domain text, we identify the critical segments and modify them to alleviate the data sparseness problem in the out-domain SMT system. After we receive the translation result, these critical segments are then restored according to the provided in-domain knowledge. We conduct experiments on an English-to- Chinese translation task in the medical domain and evaluate each step of the STR framework. The translation results show significant improvement of our approach over the out-domain and the na?ve in-domain SMT systems. ? 2012 The COLING.Cross-domain SMTDomain adaptationStatistical machine translationA simplification-translation-restoration framework for cross-domain SMT applicationsconference paper2-s2.0-84876814524https://www.scopus.com/inward/record.uri?eid=2-s2.0-84876814524&partnerID=40&md5=2e43797c7ebff4e323001cb68c19ab8b