電機資訊學院: 資訊工程學研究所指導教授: 項潔曹又霖Tsao, Yu-LinYu-LinTsao2017-03-032018-07-052017-03-032018-07-052016http://ntur.lib.ntu.edu.tw//handle/246246/275438許多數位人文的研究會需要使用到文本中的詞彙標記,而目前已經有許多現有的文本標記工具可以使用,由於各個工具擅長的詞彙標記不同,故本論文希望能夠整合多個工具去使用,但是因為各個工具所使用之格式不同,所以若要直接整合使用是無法辦到的事情,勢必要進行格式之間的轉換。為此本論文分析出文本標記格式中會有哪些資訊,並且將這些資訊進行分類,最後定義出了新的文本標記格式STAML去儲存這些資訊,並且將STAML作為各種不同文本標記格式之間轉換的中介語言,接著再利用網頁平台將這個轉換程式實際地開發出來。透過這個STAML格式與其轉換程式,本論文達到可以將這些文本標記工具整合使用的目的,藉此希望讓數位人文的研究能夠更加地順利。Tagging named entities in a text is often an essential part of preparing the text to be used in digital humanities research. Although there are several text-tagging tools available to researchers, each tool is designed for a specific purpose and the tagging formats that they use are often different. Conse- quently text tagged using a specific tool cannot be reused by another person with a different tool. In this thesis we propose an approach to integrate different text-tagging formats produced from different tools. We introduce the Simple Text-Annotation Markup Language (STAML), which serves as an intermediary representa- tion between different tagging formats. Through STAML, texts tagged us- ing one format can be used in another tagging tool without disrupting the existing annotations. STAML and web-based programs are implemented for several common Chinese language based tagging formats such as those used by MARKUS, a popular tagging tool, THDL, and TEI.5293320 bytesapplication/pdf論文公開時間: 2016/8/30論文使用權限: 同意有償授權(權利金給回饋學校)數位人文文本標記格式轉換中介格式Digital HumanitiesText-TaggingData transformationIntermediate Representation文本標記格式的轉換與應用On transformations between text-tagging formatsthesis10.6342/NTU201601410http://ntur.lib.ntu.edu.tw/bitstream/246246/275438/1/ntu-105-R03922065-1.pdf