以正規邏輯方法解決中文文本蘊含辨識問題

指導教授：黃鐘揚臺灣大學：電子工程學研究所張富傑Chang, Fu-ChiehFu-ChiehChang2014-11-302018-07-102014-11-302018-07-102014http://ntur.lib.ntu.edu.tw//handle/246246/263909在自然語言處理的應用中，理解自然語言，一直是個很有挑戰的問題。傳統的自然語言處理研究，著重在理解語言的語意與邏輯。而目前自然語言處理的的研究方向，則是著重在用巨量資料和機器學習的方式。雖然這兩種方法各有優缺點，但現今在自然語言處理的研究，傳統的語意學模型則極少被拿出來討論。而目前的機器學習方法，也有其解決問題的極限。若能整合傳統的語意學，和機器學習的方法，是一個值得研究的方向。我們建構一個系統可以用正規邏輯方法解決中文文本蘊含辨識問題。基於形式語意學和計算語意學的理論，我們先用機器學習的方式，將中文文句轉成剖析樹，再用我們提出的演算法，把剖析樹轉成語意表達式。並且，我們提出可以整合外部的知識和語意表達式的方法，並用定理證明的方式，解決中文文本蘊含辨識的問題。再來我們示範，我們的系統可以解決句型較簡單的問題。以及解決現實世界應用問題的可能性與挑戰。最後，我們得出這個系統的優缺點，以及未來可行的研究方向，來改進此系統。In the research of natural language processing (NLP), understanding the natural language is always a challenging problem. Traditionally, the research of NLP focuses on the semantics and logic of natural language. However, the present NLP research trend is focusing on the big data and machine learning techniques. These two methods have their own pros and cons; however, the traditional research of semantics and logic are seldom discussed in the recent works, and the existing machine learning techniques also have their limitations. Combining the traditional works on semantics with machine learning techniques is a good perspective to research. We build a system to solve the Chinese recognizing textual entailment (RTE) problem by formal logic method. Based on the theory of formal semantics and computational semantics, first, we use the machine learning technique to convert Chinese sentences in natural language into syntax trees. Then, we propose an algorithm to convert the syntax trees into semantic representations. Also, we propose a method that solves the RTE problem by integrating external knowledge resources with the proposed semantic representations. With these semantic representations, we can use the theorem proving techniques to solve the problem of Chinese RTE. Then, we demonstrate that our approach can solve some simple cases of Chinese RTE. Also, we show the possibilities and difficulties to solve the real-world cases. Finally, we point out the strengths and weaknesses of our system, and the possibilities on future research to improve our system.誌謝 iii 摘要 v Abstract vii 1 Introduction 1 1.1 Contributions of this Thesis 2 1.1.1 For Chinese Computational Semantics 3 1.1.2 For Chinese RTE 3 1.1.3 For Future Research 3 1.2 Organization of this Thesis 3 2 Preliminaries 5 2.1 Natural Language Understanding 5 2.2 Formal Semantics 6 2.2.1 Davidsonian Event Semantics 7 2.2.2 Neo-Davidsonian Semantics 8 2.3 Computational Semantics 8 2.4 Automated Theorem Proving 10 2.4.1 Tableaux 11 2.4.2 Resolution 13 2.5 Recognizing Textual Entailment 14 2.5.1 Formal Logic Approach 15 2.5.2 Machine Learning Approach 16 2.6 Knowledge Resources 18 2.6.1 Lexical Semantics 18 2.6.2 Ontology 19 2.6.3 Distributional Semantics 20 3 Related Works 23 3.1 Computational Semantics 23 3.1.1 Computational Semantics in Chinese 24 3.2 Recognizing Textual Entailment 25 3.2.1 Formal Logic Approach 25 3.2.2 Machine Learning Approach 26 3.2.3 Mixed Approach 26 3.2.4 Recognizing Textual Entailment in Chinese 27 4 Implementation: System Architecture and Algorithm 29 4.1 Overview 29 4.2 CKIP Chinese Parser 30 4.3 Semantic Constructor 32 4.3.1 Semantic Construction 32 4.3.2 Semantic Construction for Prepositional Phrase 36 4.3.3 Semantic Construction for Noun Phrase 38 4.3.4 Semantic Construction for DE Phrase 40 4.3.5 Semantic Construction for Embedded Sentence 41 4.3.6 Semantic Construction for Negation 43 4.3.7 Semantic Construction for Coordination 46 4.3.8 Concluding Example 48 4.4 Knowledge Resources 49 4.4.1 Lexical Semantics 49 4.4.2 Distributional Semantics 49 4.5 RTE Engine 50 4.5.1 Knowledge Builder 51 4.5.2 Knowledge Validator 55 4.5.3 Theorem Prover 57 5 Implementation Issues 59 5.1 CKIP Chinese Parser 59 5.1.1 Error in Syntax Tree 59 5.1.2 Restriction on Data Size 62 5.2 Semantic Constructor 63 5.2.1 Quantifier 63 5.2.2 Anaphora 64 5.3 RTE Engine 65 5.3.1 Knowledge Builder 65 5.3.2 Knowledge Validator 69 6 Experiment, Result and Discussion 73 6.1 Experiment on Simple Test Cases 73 6.1.1 Simple Test Cases: Success 74 6.1.2 Simple Test Cases: Failure 78 6.2 Experiment on RITE Competition Test Cases 81 6.2.1 RITE Test Cases: Success 82 6.2.2 RITE Test Cases: Failed by Discussed Issues 84 6.2.3 RITE Test Cases: Failed by Other Issues 87 7 Conclusion 89 7.1 About Chinese Computational Semantics 89 7.2 About Chinese RTE 90 8 Future Works 93 8.1 Chinese Word Segmentation and Parsing 93 8.2 Semantic Construction 93 8.3 Reasoning Algorithm 94 8.4 Knowledge Resources and Knowledge Construction 94 8.5 Real-World Application 94 Bibliography 95557819 bytesapplication/pdf論文公開時間：2015/08/22論文使用權限：同意無償授權形式語意學計算語意學自然語言理解一階邏輯中文文本蘊含辨識以正規邏輯方法解決中文文本蘊含辨識問題A Formal Logic Approach to Chinese Recognizing Textual Entailmentthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/263909/1/ntu-103-R01943082-1.pdf