On transformations between text-tagging formats
Date Issued
2016
Date
2016
Author(s)
Tsao, Yu-Lin
Abstract
Tagging named entities in a text is often an essential part of preparing the text to be used in digital humanities research. Although there are several text-tagging tools available to researchers, each tool is designed for a specific purpose and the tagging formats that they use are often different. Conse- quently text tagged using a specific tool cannot be reused by another person with a different tool. In this thesis we propose an approach to integrate different text-tagging formats produced from different tools. We introduce the Simple Text-Annotation Markup Language (STAML), which serves as an intermediary representa- tion between different tagging formats. Through STAML, texts tagged us- ing one format can be used in another tagging tool without disrupting the existing annotations. STAML and web-based programs are implemented for several common Chinese language based tagging formats such as those used by MARKUS, a popular tagging tool, THDL, and TEI.
Subjects
Digital Humanities
Text-Tagging
Data transformation
Intermediate Representation
Type
thesis
File(s)
Loading...
Name
ntu-105-R03922065-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):0280ea10e42ed524eae09e67b8fffd86