LARGE LANGUAGE MODELS PERFORM DIAGNOSTIC REASONING

Wu, Cheng-KuangCheng-KuangWuChen, Wei-LinWei-LinChenHSIN-HSI CHEN2026-03-112026-03-112023-05https://www.scopus.com/record/display.uri?eid=2-s2.0-105026754353&origin=resultslisthttps://scholars.lib.ntu.edu.tw/handle/123456789/736222We explore the extension of chain-of-thought (CoT) prompting to medical reasoning for the task of automatic diagnosis. Motivated by doctors’ underlying reasoning process, we present Diagnostic-Reasoning CoT (DR-CoT). Empirical results demonstrate that by simply prompting large language models trained only on general text corpus with two DR-CoT exemplars, the diagnostic accuracy improves by 15% comparing to standard prompting. Moreover, the gap reaches a pronounced 18% in out-domain settings. Our findings suggest expert-knowledge reasoning in large language models can be elicited through proper promptings.falseLARGE LANGUAGE MODELS PERFORM DIAGNOSTIC REASONINGconference paper2-s2.0-105026754353