Tsai, Cheng YunCheng YunTsaiKhosiin, Mik WanulMik WanulKhosiinJACOB JE-CHIAN LINCHUIN-SHAN CHEN2025-08-282025-08-282025-1109265805https://www.scopus.com/record/display.uri?eid=2-s2.0-105012592943&origin=resultslisthttps://scholars.lib.ntu.edu.tw/handle/123456789/731703The labor force is vital to construction projects, but traditional manual methods for productivity analysis are time-consuming and error-prone. Recent advancements in computer vision and deep learning offer automated solutions, yet most studies focus on low-level pose recognition, neglecting the collaborative dynamics of construction sites. This paper introduces a multi-granular crew activity recognition framework that identifies individual actions, groups collaborating workers, and links them to specific tasks. Using graph-based representations and self-attention mechanisms, the model integrates spatial and contextual information for accurate recognition. Experiments on a dataset covering rebar, formwork, and concrete operations show an overall F1 Score of 70.31%. Results highlight the importance of balancing visual features and spatial proximity for optimal performance. This framework offers an efficient solution for construction site monitoring and lays groundwork for future research on temporal modeling and human-object interaction analysis.falseConstruction monitoringDeep learningImage understandingMulti-level activity recognition[SDGs]SDG9[SDGs]SDG17Multi-granular crew activity recognition for construction monitoringjournal article10.1016/j.autcon.2025.1064282-s2.0-105012592943