指導教授:王勝德臺灣大學:電機工程學研究所陳建吉Chen, Chien-ChiChien-ChiChen2014-11-282018-07-062014-11-282018-07-062014http://ntur.lib.ntu.edu.tw//handle/246246/262859由 Aho 和 Corasick 所提出的演算法 (簡稱 AC 演算法) 可以很有效率地在一段文字中搜尋多個關鍵字所在的位置,因而被廣泛地使用於完全字串比對。然而,AC 演算法在運作時,只能一次處理一個字元,因而其效能受限於運作時脈。本論文依據 AC 演算法,提出新穎的具有多字元狀態轉移之字串比對架構,可以平行檢視多個字元,藉以加倍提升字串比對的效能。首先,說明實作 AC 演算法的三種有限自動機(finite automaton,簡稱 FA)的作法,包括確定性有限自動機 (Deterministic Finite Automaton,簡稱 DFA)、非確定有限自動機 (Nondeterministic Finite Automaton,簡稱 NFA)、以及複合有限自動機 (Hybrid Finite Automaton,簡稱 hybrid FA) 的作法。接著,提出一種推導演算法,藉以將前述 FA 推導為多字元的 FA,使其能夠平行檢視多個字元。同時,因為平行檢視多個字元所引起的對齊問題 (alignment problem),也藉由輔助轉移函式 (assistant transition) 來解決。所推導的多字元 FA 使用可程式元件來評估,所得到的評估結果顯示其能有效地加倍提升字串比對的效能。其中所得到的最佳結果為 16字元 AC-NFA 的實作,其可運作於 167.36MHz 的時脈,換算得到的字串比對效能為 21.4Gbps。The algorithm of Aho and Corasick (AC-algorithm) can locate multiple keywords in a text efficiently; and thus it is widely used in the exact string matching. However, the AC-algorithm processes the text character by character with a performance limited by the processing clock. This thesis proposes novel string-matching architectures with multiple-character state transitions based on the AC-algorithm, which can inspect multiple characters in parallel to multiply the throughput. At first, three finite automaton (FA) approaches including Deterministic Finite Automaton (DFA), Nondeterministic Finite Automaton (NFA), and hybrid FA approaches are presented to implement the AC-algorithm. Subsequently, a derivation algorithm is proposed to derive these FAs to multi-character FAs to allow for the inspection of multiple characters in parallel. Additionally, the alignment problem, which occurs while multiple characters are being inspected in parallel, is solved by using assistant transitions. The derived multi-character FAs are evaluated on programmable devices and the evaluation results indicate that the throughput can be multiplied effectively. The achievable throughput of the best result is 21.4Gbps obtained by a 16-character AC-NFA implementation operating at a 167.36MHz clock.致謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i 中文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . xi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Exact String Matching and Motivations . . . . . . . . . . . . . . 1 1.2 Approach Overview . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . 4 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Optimization in Hardware . . . . . . . . . . . . . . . . . . . . 5 2.2 Multi-Character Hardware Approaches . . . . . . . . . . . . . . . 6 2.3 Software Approaches . . . . . . . . . . . . . . . . . . . . . . . 7 3 Aho-Corasick Algorithm and Implementations. . . . . . . . . . . . . 9 3.1 AC-trie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 AC-DFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 AC-NFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 Hybrid AC-FA . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.5 Priority Multiplexer . . . . . . . . . . . . . . . . . . . . . . 21 4 Multi-Character State Transitions . . . . . . . . . . . . . . . . . 23 4.1 Alignment Problem and Previous Work . . . . . . . . . . . . . . . 23 4.2 Derivation of Multi-Character Transitions . . . . . . . . . . . . 25 4.3 Derivation Algorithm . . . . . . . . . . . . . . . . . . . . . . 29 4.4 Comparison of Derivation Approaches . . . . . . . . . . . . . . . 31 4.4.1 The proposed approach . . . . . . . . . . . . . . . . . . . . . 31 4.4.2 Yamagaki et al. approach . . . . . . . . . . . . . . . . . . . 31 4.4.3 Yang et al. approach . . . . . . . . . . . . . . . . . . . . . 33 4.4.4 Summary of comparison . . . . . . . . . . . . . . . . . . . . . 34 5 Multi-Character String-Matching Approaches . . . . . . . . . . . . 37 5.1 Multi-Character AC-DFA . . . . . . . . . . . . . . . . . . . . . 38 5.1.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 39 5.1.2 Matching example . . . . . . . . . . . . . . . . . . . . . . . 44 5.1.3 Experiment and evaluation . . . . . . . . . . . . . . . . . . . 44 5.2 Multi-Character AC-NFA . . . . . . . . . . . . . . . . . . . . . 48 5.2.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2.2 Matching example . . . . . . . . . . . . . . . . . . . . . . . 51 5.2.3 Experiment and evaluation . . . . . . . . . . . . . . . . . . . 52 5.3 Multi-Character Hybrid AC-FA . . . . . . . . . . . . . . . . . . 56 5.3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 56 5.3.2 Matching example . . . . . . . . . . . . . . . . . . . . . . . 59 5.3.3 Estimation of required stages . . . . . . . . . . . . . . . . . 60 5.3.4 Experiment and evaluation . . . . . . . . . . . . . . . . . . . 62 6 Configurable Architectures . . . . . . . . . . . . . . . . . . . . 63 6.1 Configurable stage architecture . . . . . . . . . . . . . . . . . 63 6.1.1 Block diagram . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.1.2 Experiment and evaluation . . . . . . . . . . . . . . . . . . . 65 6.2 Configurable data-width architecture . . . . . . . . . . . . . . 68 6.2.1 Basic rule unit . . . . . . . . . . . . . . . . . . . . . . . . 68 6.2.2 Example of three-unit group . . . . . . . . . . . . . . . . . . 71 6.2.3 Waveform of three-unit group . . . . . . . . . . . . . . . . . 72 6.2.4 Experiment and evaluation . . . . . . . . . . . . . . . . . . . 73 7 Comparison of approaches . . . . . . . . . . . . . . . . . . . . . 75 7.1 Comparing Results . . . . . . . . . . . . . . . . . . . . . . . . 75 7.2 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812083842 bytesapplication/pdf論文公開時間:2015/08/08論文使用權限:同意無償授權字串比對確定性及非確定有限自動機入侵偵測系統具多字元狀態轉移之高效字串比對引擎Efficient String Matching Engines with Multiple-Character State Transitionsthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/262859/1/ntu-103-D93921012-1.pdf