The Efficient Analysis Platform of Medical Informatics Based on Hadoop MapReduce and HBase
Date Issued
2011
Date
2011
Author(s)
Huang, Yuan-Hung
Abstract
Analysis of large-scale medical database has become a popular research topic in recent years. The increasing power of computers and the massive collections of medical records allow us to conduct population-based studies to identify the relationship among diseases. In practice, this kind of studies faces a serious efficiency issue due to the scale of the databases, which then severely limits the productivities of scientists. In this thesis, this efficiency issue is addressed by incorporating HBase, instead of the conventional relational database software, as the data storage framework. Based on the distinct data storage structure of HBase, a new database schema designed to support the MapReduce programming model has been proposed for carrying out distributed and parallelized analyses highly efficiently. Experimental results show that with the proposed design analyses that takes hours or even days with the conventional database framework can be completed within minutes. Another major merit of the proposed design is that the framework works smoothly with the cloud computing environment and therefore enjoys good scalability.
Subjects
Large-scale medical database
Database design
Comorbidity
Clinical trial
Distributed computing
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-100-R98945041-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):2631ac6113b71624212b226db28cb2f4