Chinese Relation Extraction by Semi-Supervised Learning for Knowledge Base Expansion

Chen, Yu-Ju

Chinese Relation Extraction by Semi-Supervised Learning for Knowledge Base Expansion

Date Issued

2015

Date

2015

Author(s)

Chen, Yu-Ju

URI

http://ntur.lib.ntu.edu.tw//handle/246246/275493

Abstract

This thesis investigates relation extraction, which learns semantic relations of concept pairs from text, as an approach to mining commonsense knowledge. To achieve good performance, state-of-the-art supervised learning requires a large labeled training set, which is often expensive to prepare. As an alternative, distant supervision, a semi-supervised learning method, was adopted to extract relations from unlabeled corpora. A training set consisting of a large amount of sentences can be weakly labeled automatically based on a set of concept pairs for any given relation in a knowledge base. Labels generated with heuristics can be quite noisy. When the sources of sentences in the training set are not correlated with the knowledge base, the automatic labeling mechanism is unreliable. Instead of assuming all sentences are labeled correctly in the training set, multiple instance learning learns from bags of instances, provided that each positive bag contains at least one positive instance while negative bags contain only negative instances. We conducted experiments on relation extraction in Chinese using concept pairs in ConceptNet, a commonsense knowledge base, as the seeds for labeling a set of predefined relations. The training bags were generated from the Sinica Corpus. The performance of multiple instance learning is compared with single-instance learning and a few other learning algorithms. Our experiments extracted new pairs for relations “AtLocation”, “CapableOf”, “HasProperty” and “IsA”. This study showed that a knowledge base can be improved by another corpus using the proposed approach.

Subjects

Relation Extraction

Multiple Instance Learning

Knowledge Base

Type

thesis

File(s)

Name

ntu-104-R01922049-1.pdf

Size

23.32 KB

Format

Adobe PDF

Checksum

(MD5):c58e48f39cc1fed67c9b879f633dddad

Chinese Relation Extraction by Semi-Supervised Learning for Knowledge Base Expansion

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)