Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus
Journal
BioData Mining
Journal Volume
14
Journal Issue
1
Date Issued
2021
Author(s)
Chung C.-W
Hsiao T.-H
Huang C.-J
Chen Y.-J
Chen H.-H
Lin C.-H
Chen T.-S
Chung Y.-F
Yang H.-I
Chen Y.-M.
Abstract
Background: Rheumatoid arthritis (RA) and systemic lupus erythematous (SLE) are autoimmune rheumatic diseases that share a complex genetic background and common clinical features. This study’s purpose was to construct machine learning (ML) models for the genomic prediction of RA and SLE. Methods: A total of 2,094 patients with RA and 2,190 patients with SLE were enrolled from the Taichung Veterans General Hospital cohort of the Taiwan Precision Medicine Initiative. Genome-wide single nucleotide polymorphism (SNP) data were obtained using Taiwan Biobank version 2 array. The ML methods used were logistic regression (LR), random forest (RF), support vector machine (SVM), gradient tree boosting (GTB), and extreme gradient boosting (XGB). SHapley Additive exPlanation (SHAP) values were calculated to clarify the contribution of each SNPs. Human leukocyte antigen (HLA) imputation was performed using the HLA Genotype Imputation with Attribute Bagging package. Results: Compared with LR (area under the curve [AUC] = 0.8247), the RF approach (AUC = 0.9844), SVM (AUC = 0.9828), GTB (AUC = 0.9932), and XGB (AUC = 0.9919) exhibited significantly better prediction performance. The top 20 genes by feature importance and SHAP values included HLA class II alleles. We found that imputed HLA-DQA1*05:01, DQB1*0201 and DRB1*0301 were associated with SLE; HLA-DQA1*03:03, DQB1*0401, DRB1*0405 were more frequently observed in patients with RA. Conclusions: We established ML methods for genomic prediction of RA and SLE. Genetic variations at HLA-DQA1, HLA-DQB1, and HLA-DRB1 were crucial for differentiating RA from SLE. Future studies are required to verify our results and explore their mechanistic explanation. ? 2021, The Author(s).
Subjects
Genome-wide association studies
Genomic prediction
Human leukocyte antigen imputation
Machine learning
Rheumatoid arthritis
Single nucleotide polymorphism
Systemic lupus erythematosus
HLA antigen
HLA antigen class 2
HLA DQA1 antigen
HLA DQB1 antigen
HLA DRB1 antigen
allele
Article
controlled study
feature selection
genetic variation
genotype
human
machine learning
major clinical study
measurement accuracy
prediction
random forest
rheumatoid arthritis
single nucleotide polymorphism
support vector machine
systemic lupus erythematosus
Taiwan
Type
journal article
