Discriminative Pattern Mining in Microbiomic Data

Huang, Nancy

doi:10.6342/NTU201603476

Discriminative Pattern Mining in Microbiomic Data

Date Issued

2016

Date

2016

Author(s)

Huang, Nancy

DOI

10.6342/NTU201603476

URI

http://ntur.lib.ntu.edu.tw//handle/246246/275567

Abstract

Machine learning classifiers have long been used to solve biological problems by predicting the target class (e.g. disease state, bacterial taxonomy, etc.) of unseen samples. A favorable and important byproduct of a special type of classifier is “interpretability” (also known as “comprehensibility”), which could be utilized to offer explanations as to why and how a sample is assigned to the predicted class. Interpretable classifiers produce “discriminative patterns” that lead to different prediction results, and provide insights to critical properties of the biological problem by capturing a greater extent of underlying semantics than single features. Discriminative patterns can be directly utilized by pattern-based classifiers to predict unseen samples by a majority voting or aggregation mechanism. In this case, we are concerned with not only finding useful individual patterns, but also the effectiveness of the pattern set as a whole. Thus, it is imperative to ensure the relevancy and non-redundancy of the discriminating patterns. Few studies have evaluated pattern redundancy via examining samples covered by the patterns; and in those that do, the focus has been mostly on the proportion of overlapping samples, suggesting that a great deal of information on non-overlapping samples were overlooked. In addition, traditional pattern mining approaches often require the generation of a complete set of initial patterns and a global discretization of continuous attributes, both of which are impractical for high-dimensional biological datasets of complex nature. We address the above issues by presenting a novel pattern selection algorithm that estimates pattern redundancy by not only the proportion of overlapping samples, but also the resemblance of non-overlapping samples. The proposed method was applied on three real microbiomic datasets, with the aim of providing new insights on the interactions between microbial factors and their effects on the host. When compared with other robust classifiers and feature selection heuristics, our pattern selection algorithm led to diverse and compact sets of final patterns that demonstrated comparable or even superior predictive capabilities.

Subjects

discriminative patterns

pattern mining

pattern relevancy

pattern redundancy

pattern selection

microbiomic data

Type

thesis

Discriminative Pattern Mining in Microbiomic Data

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)