DC 欄位 | 值 | 語言 |
dc.contributor | 指導教授:鄭卜壬 | - |
dc.contributor | 臺灣大學:資訊工程學研究所 | zh_TW |
dc.contributor.author | 江忠倫 | zh_TW |
dc.contributor.author | Chiang, Chung-Lun | en |
dc.creator | 江忠倫 | zh_TW |
dc.creator | Chiang, Chung-Lun | en |
dc.date | 2014 | - |
dc.date.accessioned | 2014-11-26T01:00:00Z | - |
dc.date.accessioned | 2018-07-05T02:14:36Z | - |
dc.date.available | 2014-11-26T01:00:00Z | - |
dc.date.available | 2018-07-05T02:14:36Z | - |
dc.date.issued | 2014 | - |
dc.identifier.uri | http://ntur.lib.ntu.edu.tw//handle/246246/261515 | - |
dc.description.abstract | 在先前對搜尋引擎結果頁面產生片段資訊(snippet)的方法著重於針對單一搜尋結果之優化,主要考量搜尋詞彙相關性及上下文的資訊含量。在此篇論文中,我們欲在單一搜尋結果頁面中的多個搜尋結果分別產生多個片段資訊,並且將此多個片段資訊視為該搜尋詞的總覽。
首先我們自問答社群網頁系統、線上百科全書、搜尋引擎推薦詞中分別抽取不同類別搜尋詞之屬性詞與前後文,並藉此資訊以產生片段資訊。在產生片段資訊時,將考量句子是否與搜尋詞相關、句子是否與該類別相關、以及句子是否含有先前抽取出來之屬性詞。在系統的第二階段,我們利用整數線性規畫找出一組最佳的句子組合,作為我們的系統輸出-多個片段資訊。除此之外,我們將結合該搜尋詞的擴充推薦搜尋之結果頁面,以補強原先未找出之屬性詞以增加每個搜尋詞之多元性。
實驗資料來源為Wikipedia、Yahoo! Answers及Google Search Autocomplete,在結果中可看出我們提出產生片段資訊之方法可行並且優於其他的摘要方法,最終有效地增加搜尋引擎結果頁面之多樣性。 | zh_TW |
dc.description.abstract | Previous work on snippet generation focused mainly on how to produce one snippet for an individual search result. This paper aims to generate snippets as a comprehensive overview for an entity query (e.g., flu) in a search-result page.
Our approach first extracts the attributes (e.g., symptom and diagnose) of the categories (e.g., disease) from multi-resources including a community-based question-answering (CQA) website, an online encyclopedia website and suggestions from a commercial search engine. Then, we generate the snippets based on how central a sentence is to the query, its category, and how well it diversifies the attributes from multi-resources. Integer Linear Programming (ILP) is adopted to find the optimal sentence set. After finding the initial set of sentences, we further improve the result by aggregate the search-result page(SERP) of the query''s suggestion words.
The experiments are conducted on Wikipedia, Yahoo! Answers, Google Search. Experimental results demonstrate the effectiveness of our approach, compared to an existing commercial search engine and several summarization baselines. | en |
dc.description.tableofcontents | 口試委員審定書i
中文摘要ii
Abstract iii
Contents iv
List of Figures vi
List of Tables vii
1 Introduction 1
2 Related Work 6
2.1 Document Summarization . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Facet Generation and Summarization . . . . . . . . . . . . . . . . . . . . 7
2.3 Previous Work with Shih-Ying Chen . . . . . . . . . . . . . . . . . . . . 8
3 Overview of Snippet Generation Approach 9
4 Off-line Knowledge Extraction 12
4.1 CQA Attribute Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Wikipedia Attribute Extraction . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Search Suggestion Attribute Extraction . . . . . . . . . . . . . . . . . . . 15
4.4 Building Category Vector . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5 On-line Sentence Selection 19
5.1 Scoring Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.2 Weight Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.3 Integer Linear Programming Model (ILP) . . . . . . . . . . . . . . . . . 21
5.4 Sentence Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.5 Post Enhancement by Suggestions . . . . . . . . . . . . . . . . . . . . . 23
6 Experiments 25
6.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.2 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.4 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.4.1 ROUGE Performance . . . . . . . . . . . . . . . . . . . . . . . 27
6.4.2 Snippet Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.5 Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.5.1 The Impact of Weights in Simportant . . . . . . . . . . . . . . . . 32
6.5.2 Number of CQA Attributes . . . . . . . . . . . . . . . . . . . . . 32
6.5.3 Limit for Same Attributes . . . . . . . . . . . . . . . . . . . . . 32
6.5.4 SoftN Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7 Conclusions and Future Work 35
Bibliography 36 | zh_TW |
dc.format.extent | 3269558 bytes | - |
dc.format.mimetype | application/pdf | - |
dc.language | en_US | - |
dc.rights | 論文使用權限:不同意授權 | - |
dc.subject | 搜尋結果總結 | zh_TW |
dc.subject | 片段資訊產生 | zh_TW |
dc.subject | 搜尋詞概要 | zh_TW |
dc.title | 整合多方資源以優化搜尋引擎結果頁面之多元性 | zh_TW |
dc.title | Aggregating Multi-Resources to Improve the Diversity of Search Engine Result Pages | en |
dc.type | thesis | en |
dc.identifier.uri.fulltext | http://ntur.lib.ntu.edu.tw/bitstream/246246/261515/1/ntu-103-R01922007-1.pdf | - |
item.openairetype | thesis | - |
item.fulltext | with fulltext | - |
item.cerifentitytype | Publications | - |
item.openairecristype | http://purl.org/coar/resource_type/c_46ec | - |
item.grantfulltext | open | - |
顯示於: | 資訊工程學系
|