應用於資料探勘上之高效能硬體架構設計

陳銘憲臺灣大學：電機工程學研究所劉韋權Liu, Wei-ChuanWei-ChuanLiu2007-11-262018-07-062007-11-262018-07-062005http://ntur.lib.ntu.edu.tw//handle/246246/53196使用硬體去加速資料探勘演算法是一個新興的議題。在本論文中，我們針對頻繁時間樣式探勘與資料分群演算法分別提出相對應的硬體架構來提高效能，藉由硬體的平行性去加速資料探勘演算法中最耗時的程序，以提昇整個演算法的資料處理速度。針對頻繁時間樣式探勘，我們提出了一個Apriori-like演算法的電路去處理會隨著資料項目增加而呈指數成長的頻繁雙項目組（frequent 2-itemsets）。透過該電路，資料只要經過一次掃描，頻繁單一與雙項目組都能在固定的時間水準內完成。另外針對資料分群演算法硬體改進方案，我們整合硬體的質心（centroid）更新機制進入資料分群演算法的執行流程，大量減少質心更新的時間以提高效能。從各種實驗數據看來，相對於傳統完全採用軟體去執行資料探勘的演算法，使用硬體加速在效能上可以得到可觀的改進。Hardware enhanced mining is an emerging issue. In this thesis, we propose two frameworks to enhance the speed of mining problems: temporal pattern mining in data streams and K-means clustering algorithm. By exploiting the parallelism in hardware, many data mining primitive subtasks can be executed with high throughput, thus increasing the performance of the overall data mining tasks. Specifically, in temporal pattern mining we realize Apriori-like algorithm within our proposed hardware enhanced mining framework. Even with the quadratic increase of the size of 2-itemsets, the counts of frequent 1-itemsets and 2-itemsets are obtained after one pass of the datasets through our hardware implementation, thus the throughput is maintained at constant level. Moreover, we propose a KACU (standing for K-means with hArdware Centroid updating) framework which integrates a hardware centroid updating mechanism into the procedure of continuous K-means algorithm. The proposed hardware frameworks are implemented in commercial Field Programmable Gate Array (FPGA) devices in order to measure their performance. The experimental results show that the hardware enhancements achieve considerably higher performance than traditional mining algorithm architectures with pure software implementation.Chapter 1. Introduction 1 Chapter 2. Preliminaries 5 Chapter 3. Mining Frequent Temporal Patterns over Data Streams 9 3.1 Introduction 9 3.1.1 Background 9 3.1.2 Motivation 13 3.2 Hardware Design 14 3.2.1 Environment 15 3.2.2 Architecture of Hardware Stream Processor 16 3.2.3 The 2-itemset Generator 18 3.2.4 Frequent Decision 20 3.3 Performance Analysis 20 3.4 Results and Discussion 22 Chapter 4. KACU: K-means with hArdware Centroid-Updating 29 4.1 Introduction 29 4.1.1 Standard K-means Algorithm 29 4.1.2 Motivation 32 4.1.3 Systolic Process Array 33 4.1.4 Continuous K-means Algorithm 34 4.2 Hardware Design 35 4.3 Performance analysis 38 4.4 Results and Discussion 40 Chapter 5. Conclusions 45 Reference: 47723011 bytesapplication/pdfen-US硬體設計資料探勘頻繁時間樣式分群演算法hardware enhanceddata miningfrequent temporal patternk-meansclusteringFPGA應用於資料探勘上之高效能硬體架構設計High Performance Hardware Enhanced Frameworks on Data Miningthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/53196/1/ntu-94-R92921032-1.pdf