Chen, Yen-TangYen-TangChenLee, Da-shengDa-shengLeeKuo, Chia-ChiChia-ChiKuoChao, Shih-LungShih-LungChaoPING-HEI CHENAmani, MohammadMohammadAmani2026-03-242026-03-242026-03https://www.scopus.com/record/display.uri?eid=2-s2.0-105029778330&origin=resultslisthttps://scholars.lib.ntu.edu.tw/handle/123456789/736621Large language models (LLMs) have shown promise in engineering knowledge transfer, but their use in industrial expert systems is limited by hallucinations and unreliable factual grounding. This study investigates the application of a retrieval-refined framework (Re2G) to improve the reliability of LLM responses in industrial chiller maintenance. Four system configurations were tested: RAG and Re2G pipelines with Gemma 2 9B and GPT-4o. Evaluation employed benchmark datasets, automatic metrics, LLM-as-judge scoring, and human ratings from 83 participants (74.7 % with '2 years HVAC experience). Results showed that Re2G improved relevancy scores by 23.5 % and reduced hallucination frequency by 31.2 % compared to baseline RAG. Cronbach’s α = 0.87 confirmed high reliability of human evaluations. Subgroup analysis revealed that students rated helpfulness higher, while professionals emphasized accuracy. A trade-off was observed as Re2G increased response latency by approximately 28 %. These findings demonstrate that retrieval refinement enhances factual reliability in industrial expert systems, while highlighting efficiency limitations. The study provides a foundation for integrating LLMs into technical training and industrial decision-support, with future extensions to operational data analysis and automatic evaluation frameworks. The study demonstrates that integrating Re2G with prompt constraints enhances factual grounding and reduces hallucinations in an industrial chiller expert system. While the case study centers on HVAC/chiller maintenance, the framework provides a foundation that could be adapted to other technical domains where knowledge transfer and workforce training are critical.trueHallucination mitigationHVAC expert systemsLarge language models (LLMS)Re2G frameworkRetrieval-augmented generation (RAG)User evaluation methodsChiller expert system development by using commercial and open-source large language modelsjournal article10.1016/j.rineng.2026.1094612-s2.0-105029778330