購物資料之資料採礦

陳銘憲臺灣大學：電機工程學研究所雲晴煌Yun, Ching-HuangChing-HuangYun2007-11-262018-07-062007-11-262018-07-062005http://ntur.lib.ntu.edu.tw//handle/246246/53515隨著全球資訊網與行動裝置的普及,客戶可以在任何地點任何時間進行交易. 這些交易資料全都被數位化與收集在各式各樣的購物資料庫中. 在資料庫研究領域, 因為它可廣泛地被應用在改進行銷策略, 資料採礦技術已經獲得廣泛的注意力. 在此論文中, 其主要的研究課題有三. 第一, 針對實體零售業賣場的購物資料, 我們探討商品分類規則對交易族群的影響. 第二, 針對實體零售業賣場的購物資料, 我們同時考量商品相關性與商品分類規則對商品族群的影響. 第三, 針對行動商務的購物資料, 我們設計演算法以採礦出行動商務交易順序性模式. 其明確之相關研究課題簡述如下: 為了自實體零售業賣場的購物資料採礦出交易族群, 我們設計一新的測量方式, 叫做分類式黏附, 用來量測交易族群之間的相似度. 我們並設計一新的演算法用來快速地採礦出交易族群. 商品跟族群之間的距離被定義為商品與它最近的群集代表點之間的連結數目. 交易與族群之間的分類式黏附被定義為此交易內所有商品跟族群之間的平均距離. 針對採礦所獲得的交易族群結果, 我們也提出資訊收益機制來驗證其品質. 為了自實體零售業賣場的購物資料採礦出商品族群, 我們設計一新的測量方式, 叫做相關分類相似, 用來量測商品族群之間的相似度. 我們並設計一新的演算法用來快速地採礦出商品族群. 針對採礦所獲得的商品族群結果, 我們也提出相關性指數與分類性指數此兩種新的機制來驗證其品質. 為了自行動商務的購物資料採礦出行動商務交易順序性模式, 我們設計了三種演算法, 分別根據(1)相關性演算法之延伸, (1)同時考量資料中的相關性與路徑, 採用路徑切除機制所設計, (3)利用所觀察到的模式家族現象所設計. 在實驗部份, 我們模擬產生行動商務資料以對所提之演算法進行分析.With the popularity of mobile devices, customers are able to make transactions from anywhere at anytime. These data has been digitized and collected among various market-basket databases. Mining of databases has attracted a growing amount of attention in database communities due to its wide applicability to improving marketing strategies. In this dissertation, we first study the impact of item taxonomy on the mining of transaction clusters from the retail market-basket database. Then, we take both association and taxonomy relationships into consideration for mining item clusters from the retail market-basket database. Finally, we investigate the problem of mining mobile sequential patterns from the mobile commerce market-basket database with moving patterns and purchase patterns of customers. Explicitly, for mining transaction clusters, we devise a novel measurement, called the category-based adherence, and utilize this measurement to perform the clustering. With this category-based adherence measurement, we develop algorithm k-todes for market-basket data with the objective to minimize the category-based adherence. The distance of an item to a given cluster is defined as the number of links between this item and its nearest tode. The category-based adherence of a transaction to a cluster is then defined as the average distance of the items in this transaction to that cluster. It is shown by our experimental results, with the taxonomy information, algorithm k-todes devised in this dissertation significantly outperforms the prior works in both the execution efficiency and the clustering quality. For mining item clusters, we devise association-taxonomy similarity and utilize this measurement to perform the clustering. With this association-taxonomy similarity measurement, we develop algorithm AT for efficiently mining item clusters. Two validation indexes based on association and taxonomy properties are also devised to assess the quality of clustering for item data. It is shown by our experimental results that algorithm AT devised in this dissertation significantly outperforms the prior works in the clustering quality as measured by the validation indexes, indicating the usefulness of association-taxonomy similarity in item data clustering. For mining mobile sequential patterns, we devise three algorithms (algorithm TJLS, algorithm TJPT, and algorithm TJPF). Algorithm TJLS is devised in light of the concept of association rules. Algorithm TJPT is devised by taking both the concepts of association rules and path traversal patterns into consideration and gains performance improvement by path trimming. Algorithm TJPF is devised by utilizing the pattern family technique which is developed to exploit the relationship between moving and purchase behaviors. A simulation model for the mobile commerce environment is developed and a synthetic workload is generated for performance studies. It is shown by our experimental results that algorithm TJPF significantly outperforms others in both the execution efficiency and the memory saving, indicating the usefulness of the pattern family technique devised in this dissertation.Contents 1 Introduction 1 1.1 Motivation and Overview of the Dissertation ..... 1 1.2 Organization of the Dissertation ..... 11 2 Adherence Clustering: An Efficient Method for Mining Transaction Clusters ..... 12 2.1 Introduction ..... 12 2.2 Preliminaries ..... 13 2.2.1 Problem Description ..... 13 2.2.2 Information Gain Validation Model ..... 15 2.3 Algorithm k-todes ..... 17 2.3.1 Similarity Measurement: Category-Based Adherence ..... 17 2.3.2 Procedure of Algorithm k-todes ..... 19 2.3.3 An Illustrative Example ..... 20 2.3.4 Complexity Analysis ..... 23 2.4 Experimental Results ..... 24 2.4.1 Data Generation ..... 25 2.4.2 Performance Study ..... 25 2.5 Summary ..... 30 3 Integrating Association and Taxonomy Similarities for Mining Item Clusters ..... 32 3.1 Introduction ..... 32 3.2 Preliminaries ..... 33 3.2.1 Problem Description ..... 33 3.2.2 Validation Indexes ..... 35 3.3 Algorithm AT (Association Taxonomy) ..... 36 3.3.1 Similarity Measurement ..... 36 3.3.2 Procedure of Algorithm AT ..... 43 3.3.3 An Illustrative Example ..... 44 3.3.4 Complexity Analysis ..... 48 3.4 Experimental Studies ..... 48 3.4.1 Data Generation ..... 48 3.4.2 Performance Study ..... 49 3.5 Summary ..... 52 4 Mining Mobile Sequential Patterns in a Mobile Commerce Environment ..... 54 4.1 Introduction ..... 54 4.2 Preliminaries ..... 56 4.2.1 Problem Formulation ..... 56 4.2.2 Related Works ..... 58 4.2.3 The Procedure for Mining Mobile Sequential Patterns ..... 59 4.3 Algorithms for Mining Mobile Sequential Patterns ..... 64 4.3.1 Algorithm TJLS ..... 65 4.3.2 Algorithm TJPT ..... 67 4.3.3 Algorithm TJPF ..... 70 4.4 Experimental Results ..... 75 4.4.1 Generation of Synthetic Mobile Transaction Sequences ..... 75 4.4.2 Performance Comparison ...... 77 4.5 Summary ..... 81 5 Conclusion ..... 83935197 bytesapplication/pdfen-US資料採礦購物資料data miningmarket-basket data購物資料之資料採礦Data Mining on Market-Basket Datathesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/53515/1/ntu-94-F86921114-1.pdf