指導教授:郭斯彥臺灣大學:電機工程學研究所吳俐瑩Wu, Li-YingLi-YingWu2014-11-282018-07-062014-11-282018-07-062014http://ntur.lib.ntu.edu.tw//handle/246246/262935在資料庫系統層面難以分析理解對於資料一致性的需求,因此在決定讀取/寫入資料時所保證的資料一致性必須由應用程式開發者根據一致性與延遲時間之間的取捨去做設定。不同類型的應用對於資料一致性的需求不同,開發者可以選擇犧牲部份的一致性來達到較快的回應時間。本篇針對應用導向分析不同服務所需的一致性需求,提供應用程式開發者根據其服務類型的特性選擇合適一致性設定的依據:根據所需滿足的一致性需求,提供一個擁有較低延遲時間的一致性設定。在實驗部分,首先觀察不同一致性強度所造成的延遲程度以及讀取和寫入之間造成的延遲時間之差別;再者模擬不同應用的workload比例以及資料傳輸特性所需的一致性以及其造成的延遲時間,並證明根據需求分析所得到之一致性設定能達到較低的延遲。As long as data replicated, the tradeoff between consistency and latency occurs. Since understanding the consistency requirements at storage system level is not possible, choosing specific consistency policy for reading and writing data requires developers to make decisions. According to application-specific consistency requirements, application developers can choose between stronger consistency with lower performance and relaxed consistency with higher performance. In this work, we propose an approach that helps application administrators to decide which consistency policy is abided by its high-level consistency semantics with lower operation response latency. In Cassandra distributed storage system, it provides flexible and tunable consistency configuration that application administrators have different choice between strong consistency and eventual consistency for both reads and writes. The decision of consistency policy can provide a guideline for developers with varying application-specific consistency requirements. Experiments show that storage system has different ability to perform read and write operation; and the selected consistency policy achieves lower latency with satisfying quorum-based consistency requirement.口試委員會審定書 # 誌謝 i 中文摘要 ii ABSTRACT iii CONTENTS iv LIST OF FIGURES vi LIST OF TABLES vii Chapter 1 Introduction 1 Chapter 2 Background 4 2.1 Tradeoff between Consistency and latency 4 2.1.1 The CAP theorem 4 2.1.2 Misleading on the CAP theorem 6 2.1.3 PACELEC instead of CAP 7 2.2 Replication in distributed storage systems 8 2.2.1 Typical Replication Mechanisms: Active and Passive Replication Techniques 8 2.2.2 Consistency Models 9 2.3 Replication in Cassandra 11 2.3.1 Cassandra Architecture 12 2.3.2 Replica, replication factor and consistency level 13 2.3.3 Replication strategy 15 Chapter 3 Related works 16 Chapter 4 Methodology 18 4.1 Application Scenarios 18 4.2 Data-Centric and Client-Centric Consistency 20 4.3 Decision Features 21 4.4 Consistency Policies 22 4.4.1 Quorum-based Protocols 23 4.5 Decision of Consistency Policy 25 4.5.1 Data-Centric Based Decision Process Flow 25 4.5.2 Client-Centric Based Decision Process Flow 26 4.6 Latency Formulation 28 Chapter 5 Experiment and Results 32 5.1 Workload Benchmark 32 5.2 Experimental setup 34 5.3 Results 35 5.3.1 Comparison of different consistency policies 35 5.3.2 Under update-heavy workload 35 5.3.3 Under read-intensive workload 36 Chapter 6 Conclusions 38 Reference 391421463 bytesapplication/pdf論文公開時間:2019/07/29論文使用權限:同意有償授權(權利金給回饋學校)分散式資料儲存系統資料複製一致性延遲於Cassandra系統評估應用導向之一致性與延遲取捨Application-specific Tradeoffs between Consistency and Latency in Cassandra Storage Systemsthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/262935/1/ntu-103-R01921037-1.pdf