Unsupervised domain adaptation for spoken document summarization with structured support vector machine
Journal
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
Pages
8347-8351
Date Issued
2013
Author(s)
Abstract
Supervised approaches can learn a spoken document summarizer generating high-quality summaries using a set of training examples matched to the domain of target documents. However, preparing a sufficient number of in-domain training examples is expensive. In this paper we propose an approach for unsupervised domain adaptation for spoken document summarization, so no in-domain training examples are needed. A summarizer is first learned from a set of out-of-domain training examples by a supervised summarization approach based on structured support vector machine, and this summarizer is used to generate a set of initial summaries for the target spoken documents. The target documents and their initial machine-generated summaries then serve as extra training examples for learning a new summarizer, which further updates the summaries of the target spoken documents. This process is continued iteratively to incrementally improve the summarizer for the target spoken documents. Moreover, extra approaches transforming the feature representations based on the data distribution in the target domain and augmenting the representations with an extra set of domain-specific features are also proposed. Encouraging results were obtained in summarizing Mandarin-English code-switching course lectures using training examples from Mandarin broadcast news. © 2013 IEEE.
Subjects
Speech Summarization; Structured Support Vector Machine; Unsupervised Domain Adaptation
Other Subjects
Data distribution; Domain adaptation; Domain specific; Feature representation; Speech summarization; Spoken document; Structured supports; Training example; Metadata; Signal processing; Speech recognition; Support vector machines; Natural language processing systems
Type
conference paper