趙坤茂臺灣大學:資訊工程學研究所黃帥Huang, Shuai-PengShuai-PengHuang2010-06-022018-07-052010-06-022018-07-052008U0001-2401200820251000http://ntur.lib.ntu.edu.tw//handle/246246/184849隨著網路的普及化,藉由搜尋引擎來找資料已經廣受歡迎,可是當使用者利用搜尋引擎查詢字詞時,往往會發現到有過多相似的字詞,造成無法判別其正確性。如果能藉由搜尋引擎所回傳的龐大資料中,列舉出現次數較多的、相似度較高的,或許可很快找到比較多人使用、正確性較高的字詞。另外,某些相似字詞是藉由文字之間的重組所產生的。因此,本論文實作了一個系統,即時將使用者所想查詢的中文相關字詞找出來,並將這些相關字詞統計後,以頻率、相似度為主排序,如果該查詢字詞是唐詩三百首中的句子或是成語,也會顯示出處。最後,系統按照使用者所指定的排序將結果呈現出來。Because of the development of the network, using search engines that search for data has become popular in this modern society. People using the search engine often find a lot of similar terms. This will cause some difficulties in determining the accuracy of terms. If we can find out the most frequent and similar terms from the results of the search engine, maybe those terms will help the user identify the most accurate terms. In addition, some similar terms are caused by the reorganization among the characters. Therefore, we propose a term analyzer for listing top-ranking terms sorted by their frequency or similarity. If the terms are one of the 300 Tang poetries or Chinese idiom, the system will also show the source. Finally, it shows the results according to the criteria specified by the user.口試委員會審定書……………………………………………………………………i謝…………………………………………………………………………………...ii文摘要……………………………………………………………………………..iii文摘要……………………………………………………………………………..iv目錄………………………………………………………………………………viii目錄……………………………………………………………………………..…ix一章 緒論…………………………………………………………………………11.1 研究動機……………………………………………………………………11.2 研究目的……………………………………………………………………21.3 論文架構……………………………………………………………………2 二章 文獻討論及相關研究……………………………………………………….32.1 Google搜尋引擎介紹………………………………………………………32.2 文字檢索技術介紹………………………………………………………….5.3 教育部成語典網站介紹…………………………………………………….6三章 系統架構…………………………………………………………………….8.1 系統實作流程……………………………………………………………….83.2 系統介面介紹…………………………………………………………...…103.3 詳細功能說明及操作流程………………………………………………...123.4 實例說明………………………………………………………………...…14四章 系統實作方法……………………………………………………………...17.1 抓取網頁原始碼方法……………………………………………………...17.2 抓取字詞的方法…………………………………………………………...184.3 排序的方法………………………………………………………………...214.4 資料庫……………………………………………………………………...214.5斷句字元……………………………………………………………...……22五章 實驗結果…………………………………………………………………...235.1 實驗一……………………………………………………………………..235.2 實驗二……………………………………………………………………..245.3 實驗三……………………………………………………………………..25六章 結論………………………………………………………………………...266.1 討論………………………………………………………………………...26.2 未來工作…………………………………………………………………...26考文獻…………………………………………………………………………….29錄一 Google網頁原始碼……………………………………………………..…31錄二 Google網頁原始碼(惕除後)……………………………………………..37application/pdf882474 bytesapplication/pdfen-US相關字詞搜尋引擎文字探勘唐詩三百首成語relevance-termssearch enginetext retrieval300 Tang poemsChinese idiom基於搜尋引擎結果之字詞分析系統A Term Analyzer Based on the Results of Search Enginesthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/184849/1/ntu-97-J94922019-1.pdf