Resource Partition and Placement Strategy for Distributed Database Systems
Date Issued
2014
Date
2014
Author(s)
Chang, Chia-Wei
Abstract
A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network. For exmaple, BigTable [10] is a distributed data storage system with high availability, performance, and scalability. BigTable stores data in a key-value pair, where the key consists of its column, row, and timestamp and the value is a string. A column family is a collection of columns that must be stored in a file, and the system must retrieve all columns in a column family if any column in that column family is needed.
The goal of this paper is to optimize a series of queries by partition the columns into several column families. Each query needs to access certain columns, and we would like to reduce the cost. We assume that that the cost of every column is given, and we consider only the cost in fetching column families while answering a query. Since retrieving any column within a column family means fetching the entire column family, this is an interesting yet challenging optimization problem.
The contributions of this paper are summarized as follows. We formulate a mathematical formulation of the column partition problem in which the object function and cost measurement are clearly defined. We also prove that minimizing the expense of a column partition is NP-complete, even when The number of partition is 2, or the cost of each column is the same. We also show maximizing the expense of a column partition is also NP-complete. However, a polynomial time algorithm exists for minimizing the expense of a column 2-partition with uniform column cost and perfect split constraint. Experiment results that verify the performance of our proposed algorithm for minimizing the expense of a column 2-partition with uniform column cost and perfect split constraint.
Subjects
資料庫
查詢優化
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-103-R01922163-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):64c33ac8d4a4e9e726814de850c26ce9