莊裕澤臺灣大學:資訊管理學研究所曾茹琦Tseng, Ru-ChiRu-ChiTseng2007-11-262018-06-292007-11-262018-06-292007http://ntur.lib.ntu.edu.tw//handle/246246/54174本篇論文藉由引用連結群聚來自不同部落格的相似標記,提供新方法來協助跨部落格的瀏覽。 部落格系統間無法互相溝通,而跨部落格的搜尋及瀏覽成為一項待解議題。部落格中所定義的標記指出部落客所屬的社群面向。因此,群聚相似的標記可以協助部落格間的搜尋瀏覽並區別不同種類的社群。 我們轉化引用連結及文章內容的資訊為圖形,並實驗了一些圖形分群方法。我們也檢驗傳統用文章內容為資訊的聚合階層式分群方法以更全面比較其中差異。 實驗結果顯示,當分群分得較粗時,使用引用資訊做不同部落格間的標記分群與使用內容為資訊的結果相近,但是當分群分得較細時,使用引用資訊的結果會比較好一些。然而,處理文章內容的資料較為費力,所以引用連結分析是一項輕量且有效的標記分群及協助跨部落格瀏覽的方法。In this thesis, we utilize the citation links to cluster similar tags from different blogs together to provide a new way to assist cross-blog browsing. Since blog systems could not communicate with each other, cross-blog searching and browsing is an issue to be solved. Tags defined in a blog indicate the aspects of communities the blogger belongs to. Thus, clustering similar tags together might help searching and browsing across blogs and distinguishing di?erent types of communities. We transform the citation and content information to create graphs and experiment several graphical clustering methods. We also examine the traditional agglomerative hierarchical clustering methods using the information of content to have a thorough comparison. The experiment result shows that clustering tags from blogs by the information of citation has roughly the same performance compared with clustering by the information of content in lower granularity and outperforms a little bit in higher granularity. However, it requires much more e?orts to process the data of content. Thus, citation link analysis is a light-weight and effective method to cluster tags and to assist cross-blog browsing.1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Related Work 5 2.1 Mechanisms Assisting Cross-Blog Searching and Browsing . . . . . . . 5 2.1.1 Changing the Structure of Blog Systems . . . . . . . . . . . . . 5 2.1.2 Mapping Tags Among Blogs . . . . . . . . . . . . . . . . . . . . 6 2.1.2.1 Folksonomy Mapping . . . . . . . . . . . . . . . . . . . 7 2.1.2.2 Machine Mapping . . . . . . . . . . . . . . . . . . . . 8 2.1.3 Brief Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Tag Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 System Design 10 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Graphical Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2.1 Citation Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2.2 Content Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Clustering Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3.1 Graphical Clustering Methods . . . . . . . . . . . . . . . . . . . 13 3.3.1.1 Edge Betweenness . . . . . . . . . . . . . . . . . . . . 13 3.3.1.2 Common Neighbors . . . . . . . . . . . . . . . . . . . 14 3.3.1.3 k-core Decomposition . . . . . . . . . . . . . . . . . . 15 3.3.2 Hierarchical Content Clustering Methods . . . . . . . . . . . . . 15 3.3.2.1 Single Linkage . . . . . . . . . . . . . . . . . . . . . . 16 3.3.2.2 Complete Linkage . . . . . . . . . . . . . . . . . . . . 16 3.3.2.3 Average Linkage . . . . . . . . . . . . . . . . . . . . . 16 3.3.2.4 Centroid Linkage . . . . . . . . . . . . . . . . . . . . . 16 4 Experiment Results and Discussion 18 4.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.1.1 Crawled Data Description . . . . . . . . . . . . . . . . . . . . . 18 4.1.2 Data Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.1.3 Analysis of Tag Groups . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3 Experiment Results and Comparison . . . . . . . . . . . . . . . . . . . 25 4.3.1 Results of Graphical Clustering Methods . . . . . . . . . . . . . 25 4.3.1.1 Results of Edge Betweenness . . . . . . . . . . . . . . 25 4.3.1.2 Results of Common Neighbors . . . . . . . . . . . . . . 26 4.3.1.3 Results of k-core . . . . . . . . . . . . . . . . . . . . . 29 4.3.1.4 Brief Summary of Graphical Clustering Results . . . . 31 4.3.2 Results of Agglomerative Hierarchical Content Clustering Methods 32 4.3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5 Conclusion and Future Work 36 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Bibliography 38424337 bytesapplication/pdfen-US部落格連結分析標記分群資料分群社會網絡分析BlogLink AnalysisTag ClusteringData ClusteringSocial Network Analysis由文章分類和引用關係改善跨部落格瀏覽機制Improving Cross-Blog Browsing Mechanism by Classification and Citationotherhttp://ntur.lib.ntu.edu.tw/bitstream/246246/54174/1/ntu-96-R94725033-1.pdf