找尋序列間關聯法則之研究

李瑞庭

Title:	找尋序列間關聯法則之研究
Other Titles:	A study on mining inter-sequence association rules
Authors:	李瑞庭
Keywords:	資料探勘;關聯法則;序列中的關聯法則;序列間的關聯法則;Data mining;Association rules;intra-sequence association rules;Inter-sequence association rules
Issue Date:	2004
Publisher:	臺北市：國立臺灣大學資訊管理學系暨研究所
Abstract:	在循序樣式的資料庫中，一個交易 (transaction)只包括一個序列(sequence) 且每一個序列皆互為獨立的，彼此不相關。我們稱此類的資料探勘為序列中的關聯法則(intra-sequence association rules)。我們進一步探討序列間樣式的關聯，我們稱之為序列間的關聯法則 (inter-sequence association rules)。就我們所知，目前，並沒有任何的資料探勘的技術，特別設計來找尋序列間的關聯法則。而序列間的關聯法則可應用於分析許多應用層面的資料：如，WWW 的路徑追蹤樣式、通信資料、疾病症狀、氣候、股票波動與DNA 序列等等。因此，在本計劃中，我們提出一個找尋序列間的關聯法則的演算法。首先，我們使用PrefixSpan 的演算法找尋所有的循序樣式，然後使用level-wise 的演算法檢查序列集合是否為大序列集合(large sequence-set)。同時，我們將每一個循序樣式發生的時間紀錄在一個時間串列中(time point list)，然後，我們將這些時間串列分成好幾個群組，並將它們儲存於L-buckets 中。因為我們使用時間串列及L-buckets 加速支持度的計算 (support counting)，使得我們所提出的演算法比Apriori-like 演算法更有效率。 There are many algorithms proposed to find sequential patterns in sequence databases where a transaction contains a sequence. Previously proposed algorithms treat each sequence as an independent one. This kind of mining belongs to intra-sequence patterns mining, because all the patterns found just describe characteristics within a sequence. We would like to go further to investigate relationships between sequential patterns in different sequences, called inter-sequence association rules mining. To the best of our knowledge, there are no data mining techniques specially designed to analyze the inter-sequence association rules. Mining inter-sequence association rules is used in many application areas. We can use inter-sequence association rules to analyze web page traversal, telecommunication, disease symptoms, weather changes, stock movements, DNA sequences, and etc. Therefore, in this project, we proposed an algorithm to mine inter-sequence association rules. First, we use the PrefixSpan algorithm to find all sequential patterns, and then we use a level-wise method to check if an sequence-set is large. We use a time point list to collect all the time points at which sequential patterns occur. Then, we divide time point lists into several groups, and store them in buckets, called L-buckets. Since our proposed algorithm uses L-buckets and time point lists to accelerate the process of support counting, our proposed algorithm outperforms the Apriori-like algorithm.
URI:	http://ntur.lib.ntu.edu.tw//handle/246246/18841
Other Identifiers:	922213E002066
Rights:	國立臺灣大學資訊管理學系暨研究所
Appears in Collections:	資訊管理學系

Files in This Item:

File	Description	Size	Format
922213E002066.pdf		89.15 kB	Adobe PDF	View/Open

Show full item record

Page view(s)

checked on May 4, 2024

Download(s)

checked on May 4, 2024

Google Scholar^TM

Check

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM