Resource Partition and Placement Strategy for Distributed Database Systems
分散式資料庫集合了計算機網路中多個邏輯相關資料庫。以 BigTable 為例,其為一高可用性、高效率,且可擴展之分散式資料存儲系統。BigTable 將資料存儲為鍵值對,其中鍵由行、欄與時間戳記組成,值則為一字串。欄族 (column family) 為一群必須存儲於同一檔案的欄。若資料庫需要取得欄族中的某些欄,則必須將整個欄族的欄全部取出。
與花費度量。我們證明最小化花費之欄族分割問題為 NP-complete,即便是分割為 2 個欄族或欄位花費皆相同亦如此。我們也證明最大化花費之欄族分割問題為 NP-complete。相對地,對於在分割為 2 個欄族,欄位花費皆相同,且分割必須為完美分割 (perfect split) 的限制下,最小化花費問題存在多項式時間演算法。我們在實驗結果中驗證了對於此問題所提出的多項式時間演算法之效率。
A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network. For exmaple, BigTable  is a distributed data storage system with high availability, performance, and scalability. BigTable stores data in a key-value pair, where the key consists of its column, row, and timestamp and the value is a string. A column family is a collection of columns that must be stored in a file, and the system must retrieve all columns in a column family if any column in that column family is needed.
The goal of this paper is to optimize a series of queries by partition the columns into several column families. Each query needs to access certain columns, and we would like to reduce the cost. We assume that that the cost of every column is given, and we consider only the cost in fetching column families while answering a query. Since retrieving any column within a column family means fetching the entire column family, this is an interesting yet challenging optimization problem.
The contributions of this paper are summarized as follows. We formulate a mathematical formulation of the column partition problem in which the object function and cost measurement are clearly defined. We also prove that minimizing the expense of a column partition is NP-complete, even when The number of partition is 2, or the cost of each column is the same. We also show maximizing the expense of a column partition is also NP-complete. However, a polynomial time algorithm exists for minimizing the expense of a column 2-partition with uniform column cost and perfect split constraint. Experiment results that verify the performance of our proposed algorithm for minimizing the expense of a column 2-partition with uniform column cost and perfect split constraint.
|Appears in Collections:||資訊工程學系|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.