Automated Data-Enrichment Processing Technologies
Date Issued
2015
Date
2015
Author(s)
Sung, Hao
Abstract
Metadata, known as ""data about data"", is an important way to describe and utilize digital objects in digital archives, digital libraries, and digital museums. To present accurate, precise, and high-quality metadata is a critical task for the digital databases, and it requires not only a high cost of human resources, but also domain know-how. Due to the labor-intensive nature of metadata construction, a model often employed in developing a large digital collection is to build different archives separately, then construct a central portal (such as a union catalog) for users to browse, search, and explore the entire collection. Although this model is effective in terms of time, manpower, and resources, it has some drawbacks. The main problem is inconsistency in the metadata constructed. This may be caused by misinterpretation of metadata attributes, different details when inputting data, or inadequate metadata format for interpreting specific data sets. In this thesis, we propose ADEPT (Automated Data Enrichment Processing Technology), a framework for pre-processing data. ADEPT contains three primary modules: data verification, data normalization, and named-entity recognition. ADEPT aims to ensure data consistency and correctness, and increases data usability at the same time. Furthermore, the enriched metadata is more suitable for linked open data. By connecting related data, we can explore and share information and knowledge through the Web.
Subjects
Digital Archives
Digital Humanities
Linked Data
Data Normalization
Terminology Extraction
Type
thesis
