Repository logo
  • English
  • 中文
Log In
Have you forgotten your password?
  1. Home
  2. College of Electrical Engineering and Computer Science / 電機資訊學院
  3. Electrical Engineering / 電機工程學系
  4. Mining of Frequent Temporal Patterns on Data Streams
 
  • Details

Mining of Frequent Temporal Patterns on Data Streams

Date Issued
2004
Date
2004
Author(s)
Teng, Wei-Guang
DOI
en-US
URI
http://ntur.lib.ntu.edu.tw//handle/246246/52960
Abstract
In recent years, several query problems and mining capabilities have been explored for a data stream environment. Among various data mining capabilities, the one receiving a significant amount of research attention is on mining frequent patterns over market basket data. In this dissertation, we first explore the model of frequent itemsets from static transaction databases and generalize relevant concepts to discovering of temporal relationship from online transaction flows. Then, we investigate the resource utilization issues in a data stream environment. Finally, we study the problem of quality guarantees when tracking online data streams. For the problem of mining frequent itemsets to derive association rules, a new mining capability, called mining of substitution rules, is first developed by extending the concepts of mining of association rules. Substitution refers to the choice made by a customer to replace the purchase of some items with that of others. The discovery of substitution rules, same as that of association rules, will lead to very valuable knowledge in various aspects, including market prediction, user behavior analysis and decision support. Specifically, we first derive theoretical properties for the model of substitution rule mining and devise a technique on the induction of positive itemset supports to improve the efficiency of support counting for negative itemsets. Then, in light of these properties, algorithm SRM (standing for substitution rule mining) is designed and implemented to discover the substitution rules efficiently while attaining good statistical significance. To mine frequent temporal patterns on data streams, a regression-based algorithm, called algorithm FTP-DS (Frequent Temporal Patterns of Data Streams) is devised. While providing a general framework of pattern frequency counting, algorithm FTP-DS has two major features, namely one data scan for online statistics collection and regression-based compact pattern representation. To attain the feature of one data scan, the data segmentation and the pattern growth scenarios are explored for the frequency counting purpose. Algorithm FTP-DS scans online transaction flows and generates candidate frequent patterns in real time. The second important feature of algorithm FTP-DS is on the regression-based compact pattern representation. In addition, we develop the techniques of the segmentation tuning and segment relaxation to enhance the functions of FTP-DS. With these features, algorithm FTP-DS is able to not only conduct mining with variable time intervals but also perform trend detection effectively. The fundamental problem that how the limited resources, e.g., memory space and computation power, can be well utilized to produce accurate estimates in a data stream environment is also addressed. Two important features for tracking mined patterns with properly utilized resources are examined. The first issue is temporal granularity which refers to the phenomenon that as time advances, people are more interested in recent events, meaning that more resources can be utilized to explore more recent data with finer granularities. Second, with the mining task of discovering frequent temporal patterns, more resources are expected to be allocated to the processing of those borderline patterns whose statistics, e.g., occurrence frequencies, are close to the specified threshold so as to have proper frequent itemset identification. This feature is called mining with support count granularity. Consequently, a wavelet-based algorithm, called algorithm RAM-DS (Resource-Aware Mining for Data Streams) is devised to perform general pattern mining tasks for data streams by exploring both temporal and support count granularities. Algorithm RAM-DS is designed to not only reduce the memory required for data storage but also retain good approximation of target time series. In addition, algorithm RAM-DS can support a varying number of data streams by allocating memory space adaptively when tracking patterns generated from online transactions. For tracking online time series data which is directly collected from sensors or is generated by stream mining algorithms, we explore the energy preservation property of wavelet-based transform. The commonly used L1- and L2-error metrics are theoretically guaranteed when insignificant coefficients are discarded for saving precious resources in our framework. In addition, to handle infinite online data flows, an enhanced data structure RAID-tree which is based on the error tree is proposed for dynamic synopses maintenance over data streams. Specifically, an algorithm RAID with the resolution adaptability for incremental decomposition is developed. Experimental results have shown that the memory required for storing significant features of time series data is very small and the quality of approximation is stable when performing incremental data updates.
Subjects
資料串流
小波轉換
頻繁時間樣式
資料探勘
Frequent Temporal Pattern
Data Stream
Data Mining
Wavelet Transform
Type
thesis

臺大位居世界頂尖大學之列,為永久珍藏及向國際展現本校豐碩的研究成果及學術能量,圖書館整合機構典藏(NTUR)與學術庫(AH)不同功能平台,成為臺大學術典藏NTU scholars。期能整合研究能量、促進交流合作、保存學術產出、推廣研究成果。

To permanently archive and promote researcher profiles and scholarly works, Library integrates the services of “NTU Repository” with “Academic Hub” to form NTU Scholars.

總館學科館員 (Main Library)
醫學圖書館學科館員 (Medical Library)
社會科學院辜振甫紀念圖書館學科館員 (Social Sciences Library)

開放取用是從使用者角度提升資訊取用性的社會運動,應用在學術研究上是透過將研究著作公開供使用者自由取閱,以促進學術傳播及因應期刊訂購費用逐年攀升。同時可加速研究發展、提升研究影響力,NTU Scholars即為本校的開放取用典藏(OA Archive)平台。(點選深入了解OA)

  • 請確認所上傳的全文是原創的內容,若該文件包含部分內容的版權非匯入者所有,或由第三方贊助與合作完成,請確認該版權所有者及第三方同意提供此授權。
    Please represent that the submission is your original work, and that you have the right to grant the rights to upload.
  • 若欲上傳已出版的全文電子檔,可使用Open policy finder網站查詢,以確認出版單位之版權政策。
    Please use Open policy finder to find a summary of permissions that are normally given as part of each publisher's copyright transfer agreement.
  • 網站簡介 (Quickstart Guide)
  • 使用手冊 (Instruction Manual)
  • 線上預約服務 (Booking Service)
  • 方案一:臺灣大學計算機中心帳號登入
    (With C&INC Email Account)
  • 方案二:ORCID帳號登入 (With ORCID)
  • 方案一:定期更新ORCID者,以ID匯入 (Search for identifier (ORCID))
  • 方案二:自行建檔 (Default mode Submission)
  • 方案三:學科館員協助匯入 (Email worklist to subject librarians)

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science