Options
Detection, Disambiguation, and Argument Identification of Chinese Discourse Connectives
Date Issued
2015
Date
2015
Author(s)
Shih, Yong-Siang
Abstract
Discourse relations represent how textual units logically connect with each other. Analyzing the discourse structure for texts could aid the understanding of the meaning behind paragraphs. There are many potential applications such as natural language interface and large-scale content-analysis. Although there are popular English discourse corpora for researchers, large-scale Chinese discourse corpora have not been available until recently. In addition, Chinese discourse analysis has many unique issues including the variety of discourse connectives, the common occurrences of parallel connectives, and the complex sentence structures. Discourse connectives are important clues for identifying discourse relations in Chinese texts. However, the ambiguity involved makes it a challenge to extract true connectives. In this thesis, we investigate four tasks regarding explicit discourse relations that are signaled by discourse connectives. Firstly, we deal with the extraction of explicit discourse connectives. Secondly, we investigate resolving linking ambiguities among connective components. Thirdly, we disambiguate the discourse relation type for each connective. Finally, we extract the arguments for each discourse connective. Several features are proposed to train Logistic Regression classifiers to disambiguate between discourse and non-discourse usages and the relation types for connectives. Additionally, we rank each connective candidate and develop a greedy algorithm to resolve linking ambiguities. Finally, the argument identification is formulated as a sequence labeling problem, and Conditional Random Fields are utilized to determine the argument boundaries. Besides explicit discourse relations, further investigation must be done to recognize implicit relations. Built upon these components, an end-to-end discourse parser for Chinese may be constructed in future studies.
Subjects
Natural Language Processing
Chinese Discourse Analysis
Discourse Connective Recognition
Discourse Relation Disambiguation
Discourse Connective Argument Identification
Type
thesis
File(s)
No Thumbnail Available
Name
ntu-104-R02922036-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):3d8c5fbad6a8612082e76451b4111907