Interpretation of Chinese Discourse Markers, Discourse Relation Recognition, and their Relationships with Sentiment Polarity
Date Issued
2014
Date
2014
Author(s)
Huang, Hen-Hsen
Abstract
Discourse relation is the rhetorical relation between two discourse units (i.e. clauses, sentences, or blocks of sentences). The famous discourse relations include Temporal, Contingency, Comparison, Expansion, and so on. A discourse relation indicates how its two discourse units cohere, and this information influences the meaning of text. Discourse relation is important clue to many applications such as summarization, opinion mining, textual entailment, and event recognition.
Recently the research on automatically English discourse relation recognition is rapid growth due to the release of corpora like Rhetoric Structure Theory Discourse Treebank (RST-DT) and Penn Discourse Treebank (PDTB). Unlike English, Chinese discourse relation recognition is more challenging because of the lack of resources and the special issues in Chinese.
In this dissertation, we give an in-depth study on Chinese discourse relation analysis. We propose a statistical algorithm to recognize the discourse relation in both levels of inter-sentential and intra-sentential. We also show our preliminary results on Chinese discourse parsing at sentence level. In Chinese, many long sentences contain more than two clauses and form complex discourse structures. Discourse parsing fetches the hierarchical structure and relation among the clauses in a given sentence.
Discourse markers are key clue to discourse process, but the use of Chinese discourse marker is inherent ambiguity. To interpret the ambiguous Chinese discourse markers, we propose a semi-supervised framework to estimate the distribution of each Chinese discourse marker from a large-sized corpus, the ClueWeb09. This semi-supervised framework with the estimated distributions finally improve the performance of Chinese discourse relation recognition.
Discourse relations and sentiment polarities are interactive in text. We investigate their correlation with ClueWeb09. A moderate-sized data annotated by human are analyzed and compared with the huge data heuristically labeled by machine. As a result, the association between sentiment and discourse is validated.
In this dissertation, we focus on the four-way discourse relation classification. We will investigate the finer-grained classification on discourse relations in the future. In addition, we will further tackle the issue of Chinese discourse parsing at paragraph level and document level.
Recently the research on automatically English discourse relation recognition is rapid growth due to the release of corpora like Rhetoric Structure Theory Discourse Treebank (RST-DT) and Penn Discourse Treebank (PDTB). Unlike English, Chinese discourse relation recognition is more challenging because of the lack of resources and the special issues in Chinese.
In this dissertation, we give an in-depth study on Chinese discourse relation analysis. We propose a statistical algorithm to recognize the discourse relation in both levels of inter-sentential and intra-sentential. We also show our preliminary results on Chinese discourse parsing at sentence level. In Chinese, many long sentences contain more than two clauses and form complex discourse structures. Discourse parsing fetches the hierarchical structure and relation among the clauses in a given sentence.
Discourse markers are key clue to discourse process, but the use of Chinese discourse marker is inherent ambiguity. To interpret the ambiguous Chinese discourse markers, we propose a semi-supervised framework to estimate the distribution of each Chinese discourse marker from a large-sized corpus, the ClueWeb09. This semi-supervised framework with the estimated distributions finally improve the performance of Chinese discourse relation recognition.
Discourse relations and sentiment polarities are interactive in text. We investigate their correlation with ClueWeb09. A moderate-sized data annotated by human are analyzed and compared with the huge data heuristically labeled by machine. As a result, the association between sentiment and discourse is validated.
In this dissertation, we focus on the four-way discourse relation classification. We will investigate the finer-grained classification on discourse relations in the future. In addition, we will further tackle the issue of Chinese discourse parsing at paragraph level and document level.
Subjects
自然語言處理
中文語篇分析
語篇關係辨識
語篇標記
意見極性
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-103-D97922036-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):ce0f8cf00700a96e63ee0eed9ee5c5af