Gao, Zhao-MingZhao-MingGao2025-08-142025-08-14202597898197592939789819759309https://scholars.lib.ntu.edu.tw/handle/123456789/731307This study focuses on both quantitative and qualitative assessments of automatic grammatical error identification, correction, and explanation for learners of Chinese using four large language models (LLMs) (namely, BART CGEC, GPT 4.0, Bard, and Claude 2) from linguistic and educational viewpoints. It was found that general-purpose chat LLMs like GPT 4.0, Bard, and Claude 2 outperformed those specifically designed for Chinese grammatical error correction such as BART CGEC. In particular, Claude 2 excelled in precision and recall for error correction, achieving nearly 95% accuracy with a modified prompt, while GPT 4.0 and Bard lagged behind with around 87.5% precision and 80% recall, and 68.97% precision and 60.6% recall, respectively. Although Claude 2 achieved approximately 66% accuracy in error identification and error explanation, its high precision and recall in error correction made it a strong candidate for an intelligent Chinese grammar checker. Our study suggests the significance of prompt engineering in using LLMs effectively, leading to an 8% improvement in error correction precision for both GPT 4.0 and Claude 2 and over 15% recall improvement in GPT 4.0. Prompt engineering plays a crucial role in optimizing AI tool performance, paving the way for their integration into language learning processes. It is anticipated that LLMs will dramatically revolutionize the outlook of language learning in the near future.[SDGs]SDG4Grammatical Error Correction and Explanation for Learners of Chinese Using Large Language Modelsbook part10.1007/978-981-97-5930-9_13