以鏈結為基礎的網站行為研究

項潔臺灣大學：資訊工程學研究所蔡雨利Tsai, Yu-LiYu-LiTsai2007-11-262018-07-052007-11-262018-07-052004http://ntur.lib.ntu.edu.tw//handle/246246/53895隨著網際網路的快速發展以及網路使用者的年齡層不斷下降，網路上的不當資訊越來越容易在家長不注意的時候，對心智尚未成熟的使用者造成不良的影響；但網際網路跨越國界的特性、節點分佈的廣闊，也使得政府機構對於規範網路上流通的資訊使不上力，唯有依靠民間企業或團體來發展過濾的機制。現今市面上的網路內容過濾軟體，其所用以判斷不當內容網站的機制多為關鍵字比對(Keyword Comparison)或者內容分析(Content Analysis)，此兩種技術皆已發展多年，達到成熟的階段，但文字基礎的分析方法容易在不同文化背景下遭遇到障礙，是值得注意的。本篇論文試著從另一個角度出發來作特定主題網站的判斷與收集，我們著眼於當文件在超鏈結環境(網際網路)中所展現出的新特質，藉由觀察特定主題網站的行為，利用並分析鏈結結構所帶給我們對於網站的資訊，嘗試發展一個適用於收集和判斷特定類型網站的演算法。The prospering of World Wide Web has brought some unexpected social problems, one of which is the influx of material not suitable for children, such as pornography and hate groups. How to shield impressionable minds from such pollution has become a challenge for computer scientists. One common approach is to build a content filtering tool that block websites containing improper information from being transmitted to the browser. Most content filtering software use keyword comparison or content analysis to identify such websites. Although these methods are effective to some extent, there are still some drawbacks. For instance, same words may represent different concepts under different cultures could lead to misdetection. When applying a pure textual based mechanism on different cultural environments for developing web site analysis algorithms, blocking sites by mistake or fail to block intended sites is a critical and crucial issue. In this thesis, we propose a new approach to website analysis. Our method is based on the observation that related websites tend to refer to each other through hyperlinks. A graph-based algorithm that utilizes this property has been designed and implemented. We have shown that our algorithm is efficient and effective in finding related site by collecting porno-sites together as an example. Additional experiments conducted on butterfly-related websites and gun-related websites have also produced satisfactory results.目錄第一章序論 1.1 研究動機 5 1.2 研究目的 7 1.3 論文架構 7 第二章相關產品與研究 2.1 相關產品研究 8 2.1.1 別碰! No!Porn! 8 2.1.2 @INFilter濾巨人 9 2.1.3 Norton Parental Control 10 2.1.4 美商N2H2 11 2.2 相關文獻探討 13 2.2.1 Authority & Hub 13 2.2.2 Companion & Co-citation 19 第三章系統概要與演算法 3.1 觀察、直覺與假設 22 3.2 演算法的開發 23 3.2.1 概念說明 23 3.2.2 演算法模型 25 第四章系統實作與實驗 4.1 系統架構分析 30 4.1.1 系統架構圖與元件 30 4.1.2 系統流程說明 31 4.2 系統開發 33 4.3 實驗數據與分析 35 4.3.1 實驗過程產生的問題與解決方式 35 4.3.2 實驗結果 36 4.4 不同主題的實驗 38 4.5 歸納與討論 39 第五章結論與展望 5.1 結論 41 5.2 相關問題討論 41 5.3 未來研究方向建議 42 參考文獻資料 44 附錄A 466557753 bytesapplication/pdfen-US鏈結網站網址linkURLhyper-link以鏈結為基礎的網站行為研究Link-based Web Site Analysisthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/53895/1/ntu-93-R91922102-1.pdf