Repository logo
  • English
  • 中文
Log In
Have you forgotten your password?
  1. Home
  2. College of Electrical Engineering and Computer Science / 電機資訊學院
  3. Electrical Engineering / 電機工程學系
  4. Graph-based Data Mining for Transactional,Spatial and Social-networking Data
 
  • Details

Graph-based Data Mining for Transactional,Spatial and Social-networking Data

Date Issued
2011
Date
2011
Author(s)
Tai, Chih-Hua
URI
http://ntur.lib.ntu.edu.tw//handle/246246/253975
Abstract
Data Mining is a data-and-application dependant technique, and has received significant attentions in the last decade. In the past years, various techniques have been developed to deal with set or sequence data in business marketing, computer networks, bioinformatics, to name a few. Many real applications, however, have called for the need of new techniques to tackle data with structural information, i.e., graphs. Graph-based data mining, which discovers novel knowledge in graph-represented data, is thus becoming more and more important. In this dissertation, motivated by the fact that graph-based data mining is still in its fancy compared to the wide applications, we attempt to address the use of graph-based data mining in realistic problems with three kinds of data complexity, respectively. First, due to the rise of cloud computing, people who lack of expertise in data mining and/or computational resources now can also take advantages from data mining by outsourcing their mining tasks. However, for any outsourcing service, privacy is a major concern. In Chapter 2, we study the problem of privacy protection in outsourcing frequent itemset mining. This problem has two challenges. One is on how to protect sensitive information, including the raw data and the frequent itemsets, with reasonable overhead and preserve the precise mining results. The other is how to protect against an attacker with related background knowledge such as item support information. To overcome these challenges, we propose k-support anonymity and develop a novel encryption approach that constructs a pseudo taxonomy tree to hide sensitive items. By leveraging the property that only the items at the leaf level of the taxonomy need to be appear at the transactions, the storage overhead is limited while the privacy protection is conformed. Second, note that data collected by sensors can consist of not only geographic attributes but also informative attributes. Since the spatial-alone clustering approaches consider only the geographic attributes to identify spatial clusters at data-dense regions, it is infeasible to obtain spatial clusters with informatively similar data points from such data by the spatial-alone clustering approaches. Therefore, we address the informative spatial data clustering (ISDC) problem in Chapter 3. One of the main challenges in this problem is that geographic and informative attributes represent different concepts and should not be tackled in the same way in clustering. To overcome this challenge, we proposed Algorithm BiAgree that introduces a graph structure, named NeiGraph, to integrate informative attributes and geographic attributes in vertices and edges, respectively. Afterward, Algorithm BiAgree is able to identify informatively similar regions regardless of the data density by partitioning NeiGraph into informative-consistent connected components. In addition, by maintaining NeiGraph, Algorithm BiAgree also provides the online computing capability to acquire the solutions with high quality and smaller computation time respectively. Finally, as the rapid growth in the number of services and applications leverage social network data, there is increasing concern about privacy issues in published social networks. Recently several studies have addressed the privacy issues on vertex/edge attributes, vertex identity, link disclosure, and so on. However, compared to the rich information inherent in graph data, the privacy issues in publications of social networks have not been fully solved. In Chapter 4, we address a new privacy issue, referred to as the community identification. The community identity of an individual is a kind of structural information that indicates the neighborhood or connections of the individual. The community identity could also represent the personal privacy information sensitive to the public, such as on-line political activity group, on-line disease support group information, or friend group in a social network. To protect such information, therefore, we propose a new privacy model, named k-structural diversity, and develop an Integer Programming formulation to find the optimal solutions to k-SDA. Moreover, we devise three scalable heuristics to solve the large instances of k-SDA with different perspectives.
Subjects
Graph-based Data Mining
Privacy-preserving
Anonymity
Grouping
Type
thesis
File(s)
Loading...
Thumbnail Image
Name

ntu-100-F92921029-1.pdf

Size

23.32 KB

Format

Adobe PDF

Checksum

(MD5):3e06e55cca76df136ccc149b7548101e

臺大位居世界頂尖大學之列,為永久珍藏及向國際展現本校豐碩的研究成果及學術能量,圖書館整合機構典藏(NTUR)與學術庫(AH)不同功能平台,成為臺大學術典藏NTU scholars。期能整合研究能量、促進交流合作、保存學術產出、推廣研究成果。

To permanently archive and promote researcher profiles and scholarly works, Library integrates the services of “NTU Repository” with “Academic Hub” to form NTU Scholars.

總館學科館員 (Main Library)
醫學圖書館學科館員 (Medical Library)
社會科學院辜振甫紀念圖書館學科館員 (Social Sciences Library)

開放取用是從使用者角度提升資訊取用性的社會運動,應用在學術研究上是透過將研究著作公開供使用者自由取閱,以促進學術傳播及因應期刊訂購費用逐年攀升。同時可加速研究發展、提升研究影響力,NTU Scholars即為本校的開放取用典藏(OA Archive)平台。(點選深入了解OA)

  • 請確認所上傳的全文是原創的內容,若該文件包含部分內容的版權非匯入者所有,或由第三方贊助與合作完成,請確認該版權所有者及第三方同意提供此授權。
    Please represent that the submission is your original work, and that you have the right to grant the rights to upload.
  • 若欲上傳已出版的全文電子檔,可使用Open policy finder網站查詢,以確認出版單位之版權政策。
    Please use Open policy finder to find a summary of permissions that are normally given as part of each publisher's copyright transfer agreement.
  • 網站簡介 (Quickstart Guide)
  • 使用手冊 (Instruction Manual)
  • 線上預約服務 (Booking Service)
  • 方案一:臺灣大學計算機中心帳號登入
    (With C&INC Email Account)
  • 方案二:ORCID帳號登入 (With ORCID)
  • 方案一:定期更新ORCID者,以ID匯入 (Search for identifier (ORCID))
  • 方案二:自行建檔 (Default mode Submission)
  • 方案三:學科館員協助匯入 (Email worklist to subject librarians)

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science