電機資訊學院: 資訊工程學研究所指導教授: 許永真呂慶輝Lu, Ching-HuiChing-HuiLu2017-03-032018-07-052017-03-032018-07-052016http://ntur.lib.ntu.edu.tw//handle/246246/275614 全球交流日盛,語言翻譯市場快速成長。在 2008 到 2018 的十年間,預估全球 GDP 平均年增 2.1%,但翻譯市場卻年增 4.7%。而整個語言服務 ( language services ) 市場,更從 2009 年的 239 億美元,增長到 2015 年的 382 億美元,以每年平均 8% 的速度穩定成長。加以網際網路普及,人工翻譯的產能無法滿足需求,使得機器翻譯 ( machine translation; automatic translation ) 的角色越來越重要 -- Google Translate 在 2016 年四月,每日翻譯量已達一千億字1。 機器翻譯從早期的規則式翻譯 ( Rule-based machine translation ) 演進到近年的統計式翻譯 (Statistical Machine Translation)-- 2007 年 Google Translate 開始使用其自身的統計式翻譯引擎以取代上一代規則式引擎,正式開啟了統計式翻譯時代至今。 統計式機器翻譯可以(一)從雙語對照的平行語料 ( Bilingual Parallel Corpora) 自動學習翻譯規則,並且(二)自動調整參數,以最佳擬合資料集。然而準備雙語語料的自動句對齊 ( Sentence Alignment ) 過程如有錯誤,將並未對齊的雙語語料送入引擎訓練,會降低引擎的品質。而自動調整參數時,依據的仍是引擎在測試集 ( test set ) 的擬合程度。故上述(一)和(二)共同的核心功能都是評測原文和譯文之間的擬合程度,或說,翻譯品質自動評測 ( Automatic Evaluation of Translation )。 目前用來評測翻譯品質的通用方法是BLEU (Bilingual Evaluation Understudy) 。BLEU只考慮文本字面的n-gram precision--因此,如果正確的翻譯卻使用了與參考答案句 ( reference sentence ) 完全不同的字詞,則可能在 BLEU 得極低分,甚至零分。而完全錯誤,文法不通的句子,如果包含了幾個關鍵字,卻反而能得較高分。BLEU 這樣的評分方式並不能忠實反應翻譯品質。本文第一個研究重點就是提出基於類神經網路╱語意的自動評測方式,能不拘於字面來評定翻譯品質。 本文創新的訓練方式:對於每一句原文,我們用人工譯文配合機器所譯出的數種較差譯文來造出訓練資料集 ( training dataset ),以類神經網路 ( ANN; Artificial Neural Network ) 來訓練品質分級。實驗結果發現,本系統確實能基於語意而非基於字面來對譯文進行品質評價。 其次,由於機器翻譯尚未能達到人工翻譯的品質,機譯之後往往需要人工的譯後編輯 (Post-Edit) 。如果機器的譯文極不理想,則譯後編輯所花的時間可能和純人工翻譯相近,甚至更多。因此本文提出人機互助的互動式翻譯 ( Interactive Machine Translation ) 流程--人與機器共同造出譯文,以節省,甚至完全消除譯後編輯的時間。我們使用的方法是先令機器對每一句原文翻譯出最佳 N 個候選句,並以簡短的方式呈現給使用者。一旦使用者修改了譯文中任何一字,本系統即根據改變的字,利用最佳 N 句去自動修改譯文的其他部份。 本文並提出新的領域翻譯 ( Domain-specific translation ) 方法。同一個英文單字在不同領域裡有不同意思。例如“Movable” 一字在通用領域指『可移動的』,但在法律領域指 『動產』。以各領域的平行語料雖可訓練出針對各該領域的翻譯引擎。但自各領域收集到的語料可能各自未達 300 萬句,不足以單獨訓練出較成熟的引擎。我們提出的方法是從通用領域的大語料訓練翻譯模型,再從特定領域的小語料提取雙語術語對照表。訓練前先將通用語料做前處理,將專門術語代換成特定標記。而在執行翻譯時,再查照各該領域的對照表,以將術語標記翻譯成該領域下的意義。如此可在各該領域語料不足時,仍能提供高品質的領域內機器翻譯。經實驗,我們的系統可依領域設定不同,而將 “Movable” 翻成各該意義。 最後,整合本文提供的各種方法,實際建構商用的網路翻譯平台。As global interaction increase, the translation market is rapidly growing. In the decade of 2008 to 2018, the estimated GDP increase is 2.1% per year, but the estimated growth of the translation market is 4.7% per year. In addition, the entire language services market rose from $23.9 billion in 2009 to $38.2 billion in 2015, steadily growing at the rate of 8% per year. Combined with the popularity of the internet, the capacity of human translation struggles to meet the market’s needs, thus the advent of the role of machine translation, or automatic translation – as of April 2016, Google Translate has translated up to one hundred billion words per day1. Machine translation has since evolved from the early rule-based machine translation to the recent statistical machine translation – in 2007, Google Translated replaced the older rule-based machine translation with its own statistical machine translation, thus opening the new era of statistical machine translation. Statistical machine translation is capable of a) automatically learning translation rules from bilingual parallel corpora and b) automatically adjusting parameters to fit the dataset. However, if errors appear during the preparatory sentence alignment process, sending in mismatched bilingual corpora, the quality of the engine will be reduced. Furthermore, automatic parameter adjustment still depends on the engine’s degree of fitting in the test set. Thus, the core function of the aforementioned a) and b) is to review the degree of fitting between source text and target text, in other words, “automatic evaluation of translation.” So far the general method of automatic evaluation of translation is Bilingual Evaluation Understudy (BLEU). BLEU concerns only the n-gram precision of the text, thus, the BLEU score of a correctly translated sentence may be extremely low or even nil if there exists many lexical differences compared to the reference sentence. On the other hand, completely wrong, grammatically inaccurate sentences may achieve a relatively higher score if it contains a few accurate key words. BLEU’s evaluation method cannot accurately assess the quality of translation. Hence, the first aim of this thesis is to propose an automatic evaluation of translation based on artificial neural network / semantics, which is capable of executing translation evaluation that accommodates lexical difference. This thesis proposes an original training method: We make a dataset by combining human translation and sentences generated by a better translation engine and sentences by a poor engine, and train the dataset for quality classification using artificial neural network (ANN). Experiments show that our system can indeed evaluate translation based on semantics and not lexis. Secondly, since machine translation often fails to reach the quality of human translation, machine translations often require human post-editing. Thus if the quality of machine translation is extremely poor, the post-editing process may take the same amount of time compared to pure human translation, even more. Thus, this thesis proposes an Interactive Machine Translation process, in which human and machine co-create target texts to reduce or even remove the time required for post-editing. Our method first commands the machine to create N-best translations from the source sentence, and present it in a succinct fashion to the user. Then once the user makes a change in the chosen target sentence, the system will regenrate the rest of the target sentence by looking up N-best sentences according to that change. This thesis also proposes a new type of Domain-specific translation. The meaning of a same English word may differ in different domains. For example, the meaning of “movable” in general corpora may mean “capable of being moved,” but in the law domain, it means “property or possessions not including land or buildings.” Even though it is possible to train a domain-specific translation engine using domain-specific parallel corpus, each individual domain may contain no more than three million sentences, insufficient if one desires to train a mature engine. Our proposed method is to train translation models from a large corpus belonging to the general domain, then extract bilingual terminology databases from a smaller corpus. We use the a general corpus during preparatory processes, and displace technical terms onto specific tags. Thus even when the specific domain itself provides insufficient corpora, the engine can still generate satisfactory domain-specific machine translation. Our experiment demonstrates that our system is capable of providing different translated versions of the word “movable” when one changes the target domain. Finally, this research construct an actual translation web-platform for business using the various methods listed above.論文使用權限: 不同意授權機器翻譯統計式機器翻譯類神經網路翻譯品質評測領域翻譯互動翻譯machine translationstatistical machine translationartificial neural networkevaluation of translation qualityin-domain translationinteractive translation自動翻譯系統之評價及改良Improvements in Machine Translation And Evaluation of Machine Translation Qualitythesis10.6342/NTU201603208