Repository logo
  • English
  • 中文
Log In
Have you forgotten your password?
  1. Home
  2. College of Electrical Engineering and Computer Science / 電機資訊學院
  3. Computer Science and Information Engineering / 資訊工程學系
  4. Improvements in Machine Translation And Evaluation of Machine Translation Quality
 
  • Details

Improvements in Machine Translation And Evaluation of Machine Translation Quality

Date Issued
2016
Date
2016
Author(s)
Lu, Ching-Hui
DOI
10.6342/NTU201603208
URI
http://ntur.lib.ntu.edu.tw//handle/246246/275614
Abstract
As global interaction increase, the translation market is rapidly growing. In the decade of 2008 to 2018, the estimated GDP increase is 2.1% per year, but the estimated growth of the translation market is 4.7% per year. In addition, the entire language services market rose from $23.9 billion in 2009 to $38.2 billion in 2015, steadily growing at the rate of 8% per year. Combined with the popularity of the internet, the capacity of human translation struggles to meet the market’s needs, thus the advent of the role of machine translation, or automatic translation – as of April 2016, Google Translate has translated up to one hundred billion words per day1. Machine translation has since evolved from the early rule-based machine translation to the recent statistical machine translation – in 2007, Google Translated replaced the older rule-based machine translation with its own statistical machine translation, thus opening the new era of statistical machine translation. Statistical machine translation is capable of a) automatically learning translation rules from bilingual parallel corpora and b) automatically adjusting parameters to fit the dataset. However, if errors appear during the preparatory sentence alignment process, sending in mismatched bilingual corpora, the quality of the engine will be reduced. Furthermore, automatic parameter adjustment still depends on the engine’s degree of fitting in the test set. Thus, the core function of the aforementioned a) and b) is to review the degree of fitting between source text and target text, in other words, “automatic evaluation of translation.” So far the general method of automatic evaluation of translation is Bilingual Evaluation Understudy (BLEU). BLEU concerns only the n-gram precision of the text, thus, the BLEU score of a correctly translated sentence may be extremely low or even nil if there exists many lexical differences compared to the reference sentence. On the other hand, completely wrong, grammatically inaccurate sentences may achieve a relatively higher score if it contains a few accurate key words. BLEU’s evaluation method cannot accurately assess the quality of translation. Hence, the first aim of this thesis is to propose an automatic evaluation of translation based on artificial neural network / semantics, which is capable of executing translation evaluation that accommodates lexical difference. This thesis proposes an original training method: We make a dataset by combining human translation and sentences generated by a better translation engine and sentences by a poor engine, and train the dataset for quality classification using artificial neural network (ANN). Experiments show that our system can indeed evaluate translation based on semantics and not lexis. Secondly, since machine translation often fails to reach the quality of human translation, machine translations often require human post-editing. Thus if the quality of machine translation is extremely poor, the post-editing process may take the same amount of time compared to pure human translation, even more. Thus, this thesis proposes an Interactive Machine Translation process, in which human and machine co-create target texts to reduce or even remove the time required for post-editing. Our method first commands the machine to create N-best translations from the source sentence, and present it in a succinct fashion to the user. Then once the user makes a change in the chosen target sentence, the system will regenrate the rest of the target sentence by looking up N-best sentences according to that change. This thesis also proposes a new type of Domain-specific translation. The meaning of a same English word may differ in different domains. For example, the meaning of “movable” in general corpora may mean “capable of being moved,” but in the law domain, it means “property or possessions not including land or buildings.” Even though it is possible to train a domain-specific translation engine using domain-specific parallel corpus, each individual domain may contain no more than three million sentences, insufficient if one desires to train a mature engine. Our proposed method is to train translation models from a large corpus belonging to the general domain, then extract bilingual terminology databases from a smaller corpus. We use the a general corpus during preparatory processes, and displace technical terms onto specific tags. Thus even when the specific domain itself provides insufficient corpora, the engine can still generate satisfactory domain-specific machine translation. Our experiment demonstrates that our system is capable of providing different translated versions of the word “movable” when one changes the target domain. Finally, this research construct an actual translation web-platform for business using the various methods listed above.
Subjects
machine translation
statistical machine translation
artificial neural network
evaluation of translation quality
in-domain translation
interactive translation
Type
thesis

臺大位居世界頂尖大學之列,為永久珍藏及向國際展現本校豐碩的研究成果及學術能量,圖書館整合機構典藏(NTUR)與學術庫(AH)不同功能平台,成為臺大學術典藏NTU scholars。期能整合研究能量、促進交流合作、保存學術產出、推廣研究成果。

To permanently archive and promote researcher profiles and scholarly works, Library integrates the services of “NTU Repository” with “Academic Hub” to form NTU Scholars.

總館學科館員 (Main Library)
醫學圖書館學科館員 (Medical Library)
社會科學院辜振甫紀念圖書館學科館員 (Social Sciences Library)

開放取用是從使用者角度提升資訊取用性的社會運動,應用在學術研究上是透過將研究著作公開供使用者自由取閱,以促進學術傳播及因應期刊訂購費用逐年攀升。同時可加速研究發展、提升研究影響力,NTU Scholars即為本校的開放取用典藏(OA Archive)平台。(點選深入了解OA)

  • 請確認所上傳的全文是原創的內容,若該文件包含部分內容的版權非匯入者所有,或由第三方贊助與合作完成,請確認該版權所有者及第三方同意提供此授權。
    Please represent that the submission is your original work, and that you have the right to grant the rights to upload.
  • 若欲上傳已出版的全文電子檔,可使用Open policy finder網站查詢,以確認出版單位之版權政策。
    Please use Open policy finder to find a summary of permissions that are normally given as part of each publisher's copyright transfer agreement.
  • 網站簡介 (Quickstart Guide)
  • 使用手冊 (Instruction Manual)
  • 線上預約服務 (Booking Service)
  • 方案一:臺灣大學計算機中心帳號登入
    (With C&INC Email Account)
  • 方案二:ORCID帳號登入 (With ORCID)
  • 方案一:定期更新ORCID者,以ID匯入 (Search for identifier (ORCID))
  • 方案二:自行建檔 (Default mode Submission)
  • 方案三:學科館員協助匯入 (Email worklist to subject librarians)

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science