https://scholars.lib.ntu.edu.tw/handle/123456789/413089
標題: | A simplification¡Vtranslation¡Vrestoration framework for domain adaptation in statistical machine translation: A case study in medical record translation | 作者: | Chen H.-B. Huang H.-H. Hsieh A.-C. Chen H.-H. |
關鍵字: | Cross-domain SMT;Domain adaptation;Medical document processing;Statistical machine translation | 公開日期: | 2017 | 卷: | 42 | 起(迄)頁: | 59-80 | 來源出版物: | Computer Speech and Language | 摘要: | Integration of in-domain knowledge into an out-of-domain statistical machine translation (SMT) system poses challenges due to the lack of resources. Lack of in-domain bilingual corpora is one such issue. In this paper, we propose a simplification¡Vtranslation¡Vrestoration (STR) framework for domain adaptation in SMT systems. An SMT system to translate medical records from English to Chinese is taken as a case study. We identify the critical segments in a medical sentence and simplify them to alleviate the data sparseness problem in the out-of-domain SMT system. After translating the simplified sentence, the translations of these critical segments are restored to their proper positions. Besides the simplification pre-processing step and the restoration post-processing step, we also enhance the translation and language models in the STR framework by using pseudo bilingual corpora generated by the background MT system. In the experiments, we adapt an SMT system from a government document domain to a medical record domain. The results show the effectiveness of the STR framework. ? 2016 Elsevier Ltd |
URI: | https://scholars.lib.ntu.edu.tw/handle/123456789/413089 | ISSN: | 08852308 | DOI: | 10.1016/j.csl.2016.08.003 |
顯示於: | 資訊工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。