陳信希臺灣大學:資訊工程學研究所鄧淳元Teng, Chun-YuanChun-YuanTeng2007-11-262018-07-052007-11-262018-07-052007http://ntur.lib.ntu.edu.tw//handle/246246/53741事件分析在自然語言處理領域、以及部落格領域是非常重要的應用。這篇論文提出幾項部落格研究上的突破,首先提出時間相關性的詞搭配(temporal collocation),以分析詞和詞在時間上的相關性,接著運用時間相關性的詞搭配來做事件的偵測和檢索,最後做一系列相關性的實驗。實驗的結果顯示:時間相關性的詞搭配,在事件的偵測和檢索上是個很好的工具。With the popularity of weblogs, it is desirable to extract abundant personal experiences, public opinions, and real events from weblogs. Although many researchers have analyzed the content of weblogs and real events, we do not find any works using multiword to discuss the relationship between the content and time. To enable the information retrieval of the content, time, and event, we provide several innovative techniques and algorithms to address these needs. (1) The temporal collocation is employed to observe the strength of term-to-term associations over time. (2) The event detection algorithm is to identify the collocations that may cause event in a specific timestamp. (3) The event description algorithm retrieves set of collocations which describe an event. In addition to these innovative techniques and algorithms, we also discuss the behavior of the temporal collocations and show the potential applications. The experimental results demonstrate that the temporal collocations capture the real world semantics and real world events over time. In general, the temporal collocations and the related techniques help users identify the real events and retrieve the interesting life patterns from weblogs.Acknowledgment ii 中文摘要 iii Abstract iv Table of Contents v List of Figures vii List of Tables viii Chapter 1 Introduction 1 1.1 Motivation Example 2 1.2 Approaches and Contribution 3 1.3 Thesis Organization 4 Chapter 2 Related Work 5 Chapter 3 Temporal Collocations 7 3.1 Definitions 7 3.1.1 Mutual Information 7 3.1.2 Temporal Mutual Information 8 3.2 Attributes 9 3.2.1 Stopwords 10 3.2.2 Adjectives 11 3.2.3 Common Collocations 12 3.3 Multiword Collocations 13 3.4 Applications 13 3.4.1 Representative Examples 14 3.4.2 Analysis of Events 15 Chapter 4 Event Retrieval System 18 4.1 Event Retrieval 18 4.2 An Event Retrieval System 19 4.3 Preprocessing Phase 19 4.3.1 Weblog Retrieval 20 4.3.2 Collocation Extraction 20 4.4 Event Detection Phase 21 4.4.1 Event Detection based on Peak Detection 21 4.5 Event Retrieval Phase 22 4.5.1 Event Description Retrieval Algorithm 23 4.5.2 Methods of selections 24 Chapter 5 Experiments and Results 25 5.1 Dataset 25 5.2 Comparison between Temporal Mutual Information versus Mutual Information 26 5.3 Evaluation of event detection 28 5.4 Evaluation of Event Retrieval 32 5.4.1 Representative Examples of Event Retrieval 32 5.4.2 Overall Performance 36 Chapter 6 Conclusion 38 REFERENCES 40 Appendix A Publication list 43en-US部落格時間內容事件自然語言blogweblogcollocationtimetemporalcontenteventdetection部落格內時間與內容相關性的分析Analyzing Content, Event, and Time by Temporal Collocations in Weblogsthesis