陳銘憲臺灣大學:電機工程學研究所陳建錦Chen, Chien-ChinChien-ChinChen2007-11-262018-07-062007-11-262018-07-062007http://ntur.lib.ntu.edu.tw//handle/246246/53181由於網路的便利性,網際網路已成為目前資訊散佈的主流媒介,許多關係人類生活的相關資訊都藉由著它來發布或交換訊息,但也由於其便利性,大量且不斷產生的網際網路資訊也增加了使用者搜尋資訊時的不便,為了要有效管理這不斷產生資訊的文件串流,事件偵測與追蹤與自動事件內容摘要化便成了目前熱門的學術研究議題。 本論文的主旨在於提供一個套管理串流文件的有效機制,我們提出了兩種事件偵測方法來自動偵測與追蹤新興的新聞事件,透過我們所提出的衰老理論,我們可有效的描述事件的生命週期來降低事件偵測的錯誤率,此外,我們也提出了一套以隱含式馬可夫模型為基礎的生命模型來描述事件的熱門程度變化,藉由所學習到的生命模型,我們可即時地預測不同事件的熱門狀態來動態的調整事件偵測中的分群門檻值。透過官方制定的實驗測試集,我們所提出的方法確實能改善現有事件偵測方法的效能。另外,為了便利使用者了解事件的來龍去脈,我們還提出了一套事件內容摘要化的方法,在摘要化的過程中,我們考慮了事件的時續性以進階產生事件的故事演變圖。實驗結果證明事件時序性能有效提升事件內容摘要化的效能,而實驗範例也說明了所產生的故事演變圖確實能捕捉到事件內的重要發展與演變。The World Wide Web (WWW) has become a major information source for people from all walks of life. Although the WWW facilitates information distribution, the ever-increasing volume of Internet documents has made information discovery from the Internet a time consuming task. To manage the massive information of the Internet efficiently, there is a critical need for event detect and summarization methods from text streams. In this dissertation, we provide two adaptive methods to detect sequential events from text streams. We first propose an aging theory to model the life cycle of events. Then, we provide an event detection framework called LIPED which utilizes HMM-based life profiles to predict the activeness status of events for adaptive threshold adjustments. To help user comprehend the development of news topics easily, we also provide a unified mechanism to construct a topic evolution graph and summary from topic documents. The experiment results based on the official TDT4 corpus show that the proposed event detection methods improve the performance of existing well-known event detection approaches substantially, and the composed topic summaries and evolution graphs are highly representative.謝辭 II 中文摘要 III 英文摘要 IV Chapter 1 Introduction 1 1.1 An Introduction to Event Detection, Topic Evolution Graph Construction and Summarization 1 1.2 Motivations 2 1.3 The Organization of This Dissertation 10 Chapter 2 Related Works 11 2.1 Topic Detection and Tracking (TDT) 11 2.2 Status Modeling of Stream Data 14 2.3 Topic Evolution Mining 17 2.4 Text Segmentation 18 2.5 Text Summarization 20 Chapter 3 An Aging Theory for Event Life Cycle Modeling 23 3.1 Problem Specification 23 3.2 Aging Theory for Event Detection 23 3.2.1 Constant Decay Aging Scheme 27 3.2.2 Training of α and β 28 3.3 The Energy-based Event Detection Algorithm 29 3.4 Performance Evaluation 31 3.4.1 Data Corpus and Evaluation Metrics 32 3.4.2 Significance of the Aging Parameters 34 3.4.3 Effectiveness of the Aging Theory 36 3.4.4 Comparisons with Other Methods 37 3.5 Conclusion 41 Chapter 4 An Adaptive Threshold Framework for Event Detection Using HMM-based Life Profiles 43 4.1 Problem Specification 43 4.2 Life Profile Modeling 44 4.2.1 Acquiring K, S, and B 47 4.2.2 Acquiring A and Π 48 4.3 LIPED 53 4.3.1 LIPED Data Models 53 4.3.2 Life Profile based Event Detection 55 4.3.3 Threshold Strategies 58 4.4 Performance Evaluation 59 4.4.1 Data Corpus and Performance Metrics 59 4.4.2 Life Profile Preparation 63 4.4.3 LIPED on Time Window Method 64 4.4.4 LIPED on Time-based Threshold Method 67 4.4.5 LIPED on Incremental Clustering Algorithm 69 4.4.6 Effects of Threshold Settings 71 4.5 Conclusion 77 Chapter 5 A Unified Eigenvector-based Method for Topics Evolution Graph Construction and Summarization 79 5.1 Problem Specification 79 5.2 Theme Generation 81 5.3 Event Segmentation and Summarization 85 5.4 Evolution Graph Construction 89 5.5 Summary Evaluation 92 5.5.1 Summary to Topic Similarity Evaluation 95 5.5.2 ROUGE Evaluation 98 5.5.3 Discussions of Summary Evaluation 101 5.6 Evolution Graph Evaluation 104 5.6.1 Case Study 1 on Topic 40023 104 5.6.2 Case Study 2 on Topic 40004 108 5.7 Conclusion 112 Chapter 6 Conclusions and Future Works 115 References 118 Appendix A: Topic Summaries 1281042383 bytesapplication/pdfen-US事件偵測與追蹤自動化文件分群自動化文件摘要事件故事演變圖Topic Detection and TrackingText ClusteringText SummarizationTopic Evolution Graph串流文件內涵事件之偵測、演變及摘要之研究Event Detection, Evolution and Summarization of Streaming Textsthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/53181/1/ntu-96-D92921018-1.pdf