https://scholars.lib.ntu.edu.tw/handle/123456789/119398
標題: | 微生物源資料之辨識模型探勘 Discriminative Pattern Mining in Microbiomic Data |
作者: | 黃安婷 Huang, Nancy |
關鍵字: | 辨識模型探勘;辨識模型相關性;辨識模型冗餘性;辨識模型選取;微生物源資料;discriminative patterns;pattern mining;pattern relevancy;pattern redundancy;pattern selection;microbiomic data | 公開日期: | 2016 | 摘要: | Machine learning classifiers have long been used to solve biological problems by predicting the target class (e.g. disease state, bacterial taxonomy, etc.) of unseen samples. A favorable and important byproduct of a special type of classifier is “interpretability” (also known as “comprehensibility”), which could be utilized to offer explanations as to why and how a sample is assigned to the predicted class. Interpretable classifiers produce “discriminative patterns” that lead to different prediction results, and provide insights to critical properties of the biological problem by capturing a greater extent of underlying semantics than single features. Discriminative patterns can be directly utilized by pattern-based classifiers to predict unseen samples by a majority voting or aggregation mechanism. In this case, we are concerned with not only finding useful individual patterns, but also the effectiveness of the pattern set as a whole. Thus, it is imperative to ensure the relevancy and non-redundancy of the discriminating patterns. Few studies have evaluated pattern redundancy via examining samples covered by the patterns; and in those that do, the focus has been mostly on the proportion of overlapping samples, suggesting that a great deal of information on non-overlapping samples were overlooked. In addition, traditional pattern mining approaches often require the generation of a complete set of initial patterns and a global discretization of continuous attributes, both of which are impractical for high-dimensional biological datasets of complex nature. We address the above issues by presenting a novel pattern selection algorithm that estimates pattern redundancy by not only the proportion of overlapping samples, but also the resemblance of non-overlapping samples. The proposed method was applied on three real microbiomic datasets, with the aim of providing new insights on the interactions between microbial factors and their effects on the host. When compared with other robust classifiers and feature selection heuristics, our pattern selection algorithm led to diverse and compact sets of final patterns that demonstrated comparable or even superior predictive capabilities. |
URI: | http://ntur.lib.ntu.edu.tw//handle/246246/275567 | DOI: | 10.6342/NTU201603476 | Rights: | 論文使用權限: 不同意授權 |
顯示於: | 資訊工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。