Improved Language Modeling Approaches for Mandarin Broadcast News Extractive Summarization

Liu, Shih-Hung

doi:10.6342/NTU201601686

Improved Language Modeling Approaches for Mandarin Broadcast News Extractive Summarization

Date Issued

2016

Date

2016

Author(s)

Liu, Shih-Hung

DOI

10.6342/NTU201601686

URI

http://ntur.lib.ntu.edu.tw//handle/246246/276539

Abstract

Extractive speech summarization aims to select an indicative set of sentences from a spoken document so as to succinctly cover the most important aspects of the document, which has garnered much research over the years. In this dissertation, we cast extractive speech summarization as an ad-hoc information retrieval (IR) problem and investigate various language modeling (LM) methods for important sentence selection. The main contributions of this dissertation are four-fold. First, we propose a novel clarity measure for use in important sentence selection, which can help quantify the thematic specificity of each individual sentence and is deemed to be a crucial indicator orthogonal to the relevance measure provided by the LM-based methods. Second, we explore a novel sentence modeling paradigm building on top of the notion of relevance, where the relationship between a candidate summary sentence and a spoken document to be summarized is unveiled through different granularities of context for relevance modeling. In addition, not only lexical but also topical cues inherent in the spoken document are exploited for sentence modeling. Third, we explore a novel approach that generates overlapped clusters to extract sentence relatedness information from the document to be summarized, which can be used not only to enhance the estimation of various sentence models but also to facilitate the sentence-level structural relationships for better summarization performance. Fourth, we also explore several effective formulations of proximity cues, and proposing a position-aware language modeling framework using various granularities of position-specific information for sentence modeling. Extensive experiments are conducted on Mandarin broadcast news summarization dataset with Mandarin large vocabulary continuous speech recognition (LVCSR), and the empirical results seem to demonstrate the performance merits of our methods when compared to several existing well-developed and/or state-of-the-art methods.

Subjects

extractive speech summarization

clarity measure

relevance language modeling

overlapped clustering

proximity-based LM

position-aware LM

Type

thesis

File(s)

Name

ntu-105-D98921032-1.pdf

Size

23.32 KB

Format

Adobe PDF

Checksum

(MD5):3e07d01196cbc70b33d425edebb8722f

Improved Language Modeling Approaches for Mandarin Broadcast News Extractive Summarization

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)