Retrieving and Identifying Interlanguage Signatures and Sentences
Date Issued
2009
Date
2009
Author(s)
Yang, Ming-Han
Abstract
This paper describes a statistic method aiming at automatically retrieving and identifying interlanguage sentences. Interlanguage is a kind of language developed by a second language learner who has not become fully proficient yet but trying to approximate the learned language. The framework does not require human annotated and is language universal, thus can be applied to retrieve interlanguage between any two given languages. The framework has three stages, the first is approximating interlanguage with an order-preserved phrasal machine translator, the second is training a classifier to identifying interlanguage sentences, and the last is refining the classifier by retraining a new classifier with the interlanguage indentified by the classifier in second stage. The frame work is applied to extract a set of Chinese-English sentences for evaluation which reveals 64.58% in precision and 56.67% in recall while identifying a set of Chinese-English sentences from normal English sentences in the abstracts of thesis in English written by graduate students in Taiwan.
Subjects
interlanguage
auto editing
language model
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-98-R94922134-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):15fef674496004e9a4155b3e8f3c0ae3
