https://scholars.lib.ntu.edu.tw/handle/123456789/105197
Title: | 找尋序列間關聯法則之研究 | Other Titles: | A study on mining inter-sequence association rules | Authors: | 李瑞庭 | Keywords: | 資料探勘;關聯法則;序列中的關聯法則;序列間的關 聯法則;Data mining;Association rules;intra-sequence association rules;Inter-sequence association rules | Issue Date: | 2004 | Publisher: | 臺北市:國立臺灣大學資訊管理學系暨研究所 | Abstract: | 在循序樣式的資料庫中,一個交易 (transaction)只包括一個序列(sequence) 且每一個序列皆互為獨立的,彼此不相 關。我們稱此類的資料探勘為序列中的 關聯法則(intra-sequence association rules)。我們進一步探討序列間樣式的關 聯,我們稱之為序列間的關聯法則 (inter-sequence association rules)。就我們 所知,目前,並沒有任何的資料探勘的 技術,特別設計來找尋序列間的關聯法 則。而序列間的關聯法則可應用於分析 許多應用層面的資料:如,WWW 的路 徑追蹤樣式、通信資料、疾病症狀、氣 候、股票波動與DNA 序列等等。 因此,在本計劃中,我們提出一個 找尋序列間的關聯法則的演算法。首 先,我們使用PrefixSpan 的演算法找尋 所有的循序樣式,然後使用level-wise 的演算法檢查序列集合是否為大序列集 合(large sequence-set)。同時,我們將每 一個循序樣式發生的時間紀錄在一個時 間串列中(time point list),然後,我們將 這些時間串列分成好幾個群組,並將它 們儲存於L-buckets 中。因為我們使用時 間串列及L-buckets 加速支持度的計算 (support counting),使得我們所提出的演 算法比Apriori-like 演算法更有效率。 There are many algorithms proposed to find sequential patterns in sequence databases where a transaction contains a sequence. Previously proposed algorithms treat each sequence as an independent one. This kind of mining belongs to intra-sequence patterns mining, because all the patterns found just describe characteristics within a sequence. We would like to go further to investigate relationships between sequential patterns in different sequences, called inter-sequence association rules mining. To the best of our knowledge, there are no data mining techniques specially designed to analyze the inter-sequence association rules. Mining inter-sequence association rules is used in many application areas. We can use inter-sequence association rules to analyze web page traversal, telecommunication, disease symptoms, weather changes, stock movements, DNA sequences, and etc. Therefore, in this project, we proposed an algorithm to mine inter-sequence association rules. First, we use the PrefixSpan algorithm to find all sequential patterns, and then we use a level-wise method to check if an sequence-set is large. We use a time point list to collect all the time points at which sequential patterns occur. Then, we divide time point lists into several groups, and store them in buckets, called L-buckets. Since our proposed algorithm uses L-buckets and time point lists to accelerate the process of support counting, our proposed algorithm outperforms the Apriori-like algorithm. |
URI: | http://ntur.lib.ntu.edu.tw//handle/246246/18841 | Other Identifiers: | 922213E002066 | Rights: | 國立臺灣大學資訊管理學系暨研究所 |
Appears in Collections: | 資訊管理學系 |
File | Description | Size | Format | |
---|---|---|---|---|
922213E002066.pdf | 89.15 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.