Cross-language information access to multilingual collections on the internet.
Journal
JASIS
Journal Volume
51
Journal Issue
3
Pages
281-296
Date Issued
2000
Author(s)
Bian, Guo-Wei
Abstract
Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable translated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what form the translated result is presented in. About 100,000 Web pages translated in the last four months of 1997 are used for quantitative study of online and real-time Web page translation.
Other Subjects
Algorithms; Computer systems programming; Data acquisition; Glossaries; HTML; Internet; Natural language processing systems; Query languages; Response time (computer systems); Cross-language information access; Machine transliteration algorithms; Information retrieval systems
Type
journal article
