|Cohort selection for clinical trials using multiple instance learning
|Clinical natural language processing; Cohort selection; Electronic health records; Multiple instance learning
|Journal of Biomedical Informatics
Identifying patients eligible for clinical trials using electronic health records (EHRs) is a challenging task usually requiring a comprehensive analysis of information stored in multiple EHRs of a patient. The goal of this study is to investigate different methods and their effectiveness in identifying patients that meet specific eligibility selection criteria based on patients’ longitudinal records. An unstructured dataset released by the n2c2 cohort selection for clinical trials track was used, each of which included 2–5 records manually annotated to thirteen pre-defined selection criteria. Unlike the other studies, we formulated the problem as a multiple instance learning (MIL) task and compared the performance with that of the rule-based and the single instance-based classifiers. Our official best run achieved an average micro-F score of 0.8765 which was ranked as one of the top ten results in the track. Further experiments demonstrated that the performance of the MIL-based classifiers consistently yield better performance than their single-instance counterparts in the criteria that require the overall comprehension of the information distributed among all of the patient's EHRs. Rule-based and single instance learning approaches exhibited better performance in criteria that don't require a consideration of several factors across records. This study demonstrated that cohort selection using longitudinal patient records can be formulated as a MIL problem. Our results exhibit that the MIL-based classifiers supplement the rule-based methods and provide better results in comparison to the single instance learning approaches. ? 2020 Elsevier Inc.
|Classification (of information); Comprehensive analysis; Electronic health record (EHRs); Instance-based classifiers; Learning approach; Longitudinal records; Multiple-instance learning; Rule-based method; Selection criteria; Learning systems; acetylsalicylic acid; creatinine; hemoglobin A1c; abdominal surgery; alcohol abuse; Article; cardiovascular disease; cohort analysis; creatinine blood level; diet supplementation; electronic health record; heart infarction; human; intestine resection; ketoacidosis; learning algorithm; patient decision making; patient identification; patient selection; priority journal; small intestine obstruction; study design; support vector machine; electronic health record; machine learning; motivation; patient selection; Cohort Studies; Electronic Health Records; Humans; Machine Learning; Motivation; Patient Selection
|Appears in Collections:
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.