微生物源資料之辨識模型探勘

黃安婷; Huang, Nancy

標題:	微生物源資料之辨識模型探勘 Discriminative Pattern Mining in Microbiomic Data
作者:	黃安婷 Huang, Nancy
關鍵字:	辨識模型探勘;辨識模型相關性;辨識模型冗餘性;辨識模型選取;微生物源資料;discriminative patterns;pattern mining;pattern relevancy;pattern redundancy;pattern selection;microbiomic data
公開日期:	2016
摘要:	Machine learning classifiers have long been used to solve biological problems by predicting the target class (e.g. disease state, bacterial taxonomy, etc.) of unseen samples. A favorable and important byproduct of a special type of classifier is “interpretability” (also known as “comprehensibility”), which could be utilized to offer explanations as to why and how a sample is assigned to the predicted class. Interpretable classifiers produce “discriminative patterns” that lead to different prediction results, and provide insights to critical properties of the biological problem by capturing a greater extent of underlying semantics than single features. Discriminative patterns can be directly utilized by pattern-based classifiers to predict unseen samples by a majority voting or aggregation mechanism. In this case, we are concerned with not only finding useful individual patterns, but also the effectiveness of the pattern set as a whole. Thus, it is imperative to ensure the relevancy and non-redundancy of the discriminating patterns. Few studies have evaluated pattern redundancy via examining samples covered by the patterns; and in those that do, the focus has been mostly on the proportion of overlapping samples, suggesting that a great deal of information on non-overlapping samples were overlooked. In addition, traditional pattern mining approaches often require the generation of a complete set of initial patterns and a global discretization of continuous attributes, both of which are impractical for high-dimensional biological datasets of complex nature. We address the above issues by presenting a novel pattern selection algorithm that estimates pattern redundancy by not only the proportion of overlapping samples, but also the resemblance of non-overlapping samples. The proposed method was applied on three real microbiomic datasets, with the aim of providing new insights on the interactions between microbial factors and their effects on the host. When compared with other robust classifiers and feature selection heuristics, our pattern selection algorithm led to diverse and compact sets of final patterns that demonstrated comparable or even superior predictive capabilities.
URI:	http://ntur.lib.ntu.edu.tw//handle/246246/275567
DOI:	10.6342/NTU201603476
Rights:	論文使用權限: 不同意授權
顯示於：	資訊工程學系

顯示文件完整紀錄

Page view(s)

checked on 2024/4/13

Google Scholar^TM

檢查

Altmetric

TAIR相關文章

Page view(s)

Google ScholarTM

Altmetric

Altmetric

Google Scholar^TM