2007-10-012024-05-18https://scholars.lib.ntu.edu.tw/handle/123456789/707205摘要:隨著網際網路的日漸普及,文字摘要(text summarization)便成為一項重要的研究議題。文字摘要將一群文件內的重要資訊摘錄出來,使用者可藉由摘要結果快速的了解文件主旨而無需閱讀大量的文件內容。以往的文字摘要研究多著重於提升摘要的多樣性(diversity)以求摘要內容能包含原文的重要主旨,但若原文與一真實世界的事件有關時,提升摘要多樣性便不足以產生高品質的事件摘要,由於事件在網際網路上是由一群有時序性的文件所描述的,事件摘要必須要能反映事件的時序性才能讓讀者了解事件發展。在我們先前的研究中,我們提出了一套方法來產生事件摘要與事件故事演變圖,在該研究中,我們明確定義了事件的內容主軸、次事件、次事件摘要,並用一套以特徵向量(eigenvector)為基礎的方法來尋找各元素的相互關係,此外,我們還將所有次事件依其時序性與內容相似性來建立其時序性相互關係以形成事件的故事演變圖,藉由事件的故事演變圖與摘要,使用者可以快速的了解新聞事件的時序發展。 在本計畫,我們將研究一新興的事件摘要議題,我們將專注於摘要有極性立場(polarity)的新聞事件,通常一具有爭議性的新聞事件會包含很多極性立場,如一政治新聞事件可能會包含了不同黨派的互斥論述,辨識與摘要各立場觀點將能提供閱讀者一個全面且無偏頗的事件說明。在這計畫內,我們將提出一套以主成分分析法(principal components analysis)為基礎的系統來辨識與摘要事件內的極性立場,主成分分析法為一有效率的統計方法,它可分析一多維度資料集內的潛藏涵義,我們將運用主成分分析法來分析事件內容,並依其用字習性來找出事件內互斥的立場字彙,該字彙可用來判斷事件文句的極性立場並產生各立場的文字摘要。此外,我們還會改良關聯性係數的計算方式以降低文句用字稀疏對主成分分析的影響性,該方法將能提升主成分分析法的樣本代表性,進而提升事件極性分析與摘要的效能。 <br> Abstract: Text summarization is an important research to tackle the increasing growth of the Internet documents. It extracts the core of a set of documents so that users can quickly capture the gist of the document without reading all of them. Previous summarization researches indicate that a representative summary should possess high diversity to detail the content of the summarized documents. However, when the documents are related to an evolutionary topic which is a real world incident and is reported by a chronological series of documents from different sources in the Internet, only diversity is not enough for topic summarization and the temporality of the documents should be considered to make the topic summary comprehensible. In our previous works on topic summarization, we have presented a method for constructing evolution graphs and summaries of news topics. We specifically define the relationships between themes, events, and summaries of topics, and extract them effectively by using a unified eigenvector-based method. The summarized events are then linked according to their temporal similarities to form the evolution graph. With the help of topic summaries and topic evolution graphs, users can understand the storyline of topics easily. In this research project, we will investigate a new topic summarization issue. We will focus on summarizing debated news topics and propose a method to identify polarities of the debated topics. Generally, the content of a debated news topic consists of perspectives of different polarities. For example, a politic news topic can contain conflicting opinions of various parties. Identifying and summarizing polarities of debated topics can provide users all-around and unbiased understandings of the topics. In this project, we will propose a domain-independent algorithm to identify polarities of debated topics. The proposed algorithm is based on the well known statistic method principal component analysis (PCA), which has shown its ability in mining the semantic concepts hidden in a large dataset. We employ PCA to identify polarities of words of topics. Then the polarities of words can be used to classify topic sentences to compose topic’s polarity summaries. Additionally, the proposed method will revise the method of correlation coefficient to solve the problem of sentence sparseness. The revised method will distinguish representative samples from noisy training data to improve the performance of polarity identification.事件摘要化文字極性分析時序性文字探勘Topic Summarizationtext polarity analysistemporal text mining時序性新聞事件之極性立場分析與摘要化研究