Automated Evaluation of Reflection and Feedback Quality in Workplace-Based Assessments by Using Natural Language Processing: Cross-Sectional Competency-Based Medical Education Study.

Chen, Jeng-Wen; Tu, Hai-Lun; Chang, Chun-Hsiang; WEI-CHUNG HSU; Wang, Pa-Chun; Liao, Chun-Hou; Chen, Mingchih

doi:10.2196/81718

Automated Evaluation of Reflection and Feedback Quality in Workplace-Based Assessments by Using Natural Language Processing: Cross-Sectional Competency-Based Medical Education Study.

Journal

JMIR medical education

Journal Volume

11

Start Page

Article number e81718

ISSN

2369-3762

Date Issued

2025-10-22

Author(s)

Chen, Jeng-Wen

Tu, Hai-Lun

Chang, Chun-Hsiang

WEI-CHUNG HSU

Wang, Pa-Chun

Liao, Chun-Hou

Chen, Mingchih

DOI

10.2196/81718

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/737271

Abstract

Background: Competency-based medical education relies heavily on high-quality narrative reflections and feedback within workplace-based assessments. However, evaluating these narratives at scale remains a significant challenge. Objective: This study aims to develop and apply natural language processing (NLP) models to evaluate the quality of resident reflections and faculty feedback documented in Entrustable Professional Activities (EPAs) on Taiwan’s nationwide Emyway platform for otolaryngology residency training. Methods: This 4-year cross-sectional study analyzes 300 randomly sampled EPA assessments from 2021 to 2025, covering a pilot year and 3 full implementation years. Two medical education experts independently rated the narratives based on relevance, specificity, and the presence of reflective or improvement-focused language. Narratives were categorized into 4 quality levels—effective, moderate, ineffective, or irrelevant—and then dichotomized into high quality and low quality. We compared the performance of logistic regression, support vector machine, and bidirectional encoder representations from transformers (BERT) models in classifying narrative quality. The best performing model was then applied to track quality trends over time. Results: The BERT model, a multilingual pretrained language model, outperformed other approaches, achieving 85% and 92% accuracy in binary classification for resident reflections and faculty feedback, respectively. The accuracy for the 4-level classification was 67% for both. Longitudinal analysis revealed significant increases in high-quality reflections (from 70.3% to 99.5%) and feedback (from 50.6% to 88.9%) over the study period. Conclusions: BERT-based NLP demonstrated moderate-to-high accuracy in evaluating the narrative quality in EPA assessments, especially in the binary classification. While not a replacement for expert review, NLP models offer a valuable tool for monitoring narrative trends and enhancing formative feedback in competency-based medical education.

Subjects

Emyway platform

competency-based medical education

entrustable professional activities

feedback

otolaryngology

reflection

residency

workplace-based assessment

Type

journal article

Automated Evaluation of Reflection and Feedback Quality in Workplace-Based Assessments by Using Natural Language Processing: Cross-Sectional Competency-Based Medical Education Study.

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)