Multi-LLM-based augmentation and synthetic data generation of construction schedules and task descriptions with SLM-as-a-judge assessment

Singh, Akarsth Kumar; SHANG-HSIEN HSIEH

doi:10.1016/j.aei.2025.103825

Multi-LLM-based augmentation and synthetic data generation of construction schedules and task descriptions with SLM-as-a-judge assessment

Journal

Advanced Engineering Informatics

Journal Volume

69

Start Page

103825

ISSN

14740346

Date Issued

2026-01

Author(s)

Singh, Akarsth Kumar

SHANG-HSIEN HSIEH

DOI

10.1016/j.aei.2025.103825

URI

https://www.scopus.com/record/display.uri?eid=2-s2.0-105015140692&origin=resultslist

https://scholars.lib.ntu.edu.tw/handle/123456789/732679

Abstract

The fragmented structure, semantic inconsistency, and limited availability of construction schedule data significantly hinder the development of intelligent planning tools in the architecture, engineering, and construction (AEC) domain. In particular, the absence of high-quality, hierarchically structured Work Breakdown Structure with Task Dependency (WBS-TD) datasets restricts the training and evaluation of AI-based models for automated construction workflows. This study investigates whether Large Language Models (LLMs) can be systematically applied to enhance and generate construction schedule and task description data, and whether lightweight, locally deployed Small Language Models (SLMs) can effectively evaluate these outputs using domain-specific rubrics in a scalable and privacy-preserving manner. To address this, an integrated methodology is proposed, consisting of three components: (1) Role-Guided Modular Prompt Chaining (RGPC), which transforms inconsistent WBS-TD inputs into logically ordered and semantically enriched outputs; (2) synthetic data generation via a multi-LLM pipeline using structured prompt strategies to produce diverse, realistic construction schedules and descriptions; and (3) SLM-as-a-Judge, a rubric-based evaluation approach that uses a lightweight, locally deployed SLMs to assess output quality across structural, logical, and domain-specific dimensions without requiring sensitive data to leave secure environments. Experimental results show that Claude-3.5-Sonnet achieved 77 % quality in augmented schedule generation, Gemini-2.0-Flash reached 92 % in synthetic schedule generation, and DeepSeek-R1 provided the best balance of quality and diversity in synthetic construction task description generation, demonstrating strong domain alignment across tasks. The framework generates reusable, machine-readable knowledge graph datasets supporting downstream applications such as AI-assisted planning, progress monitoring, and risk analysis. This study delivers a scalable, model-agnostic pipeline that advances automation and evaluation in construction informatics.

Subjects

Automation in planning and scheduling

Construction data augmentation

Construction informatics

Large language models (LLMs)

SLM-as-a-judge

Small language models (SLMs)

Synthetic data generation

Publisher

Elsevier Ltd

Type

journal article

Multi-LLM-based augmentation and synthetic data generation of construction schedules and task descriptions with SLM-as-a-judge assessment

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)