Hsu, Chin-ChiehChin-ChiehHsuWang, You-WeiYou-WeiWangLin, Lung-ChunLung-ChunLinRUEY-FENG CHANG2025-03-042025-03-042025-07-0117468094https://www.scopus.com/record/display.uri?eid=2-s2.0-85217650524&origin=recordpagehttps://scholars.lib.ntu.edu.tw/handle/123456789/725434The left ventricle (LV), as the primary chamber responsible for systemic circulation, plays a crucial role in cardiac function assessment. Echocardiography which particularly focuses on LV, is vital for cardiac disease diagnosis. However, the diagnostic accuracy heavily depends on image quality, which requires systematic assessment. In this study, we propose a two-stage deep learning approach for echocardiographic quality surveillance using a dataset of 514 annotated videos. The first stage employs EchoNet, to extract LV volumes of interest. The second stage introduces ST-R(2 + 1)D-ConvNeXt, a novel ConvNeXt-based model designed to disentangle spatiotemporal features and leverage echocardiographic hallmarks within the apical-four-chamber (A4C) dynamic echocardiogram data. The proposed approach achieves an accuracy of 82.63 %, an Area Under the Curve (AUC) of 0.89, a sensitivity of 84.10 %, and a specificity of 81.08 % in classifying echocardiographic videos into high and low quality. Furthermore, through explainable AI techniques, our model identifies specific quality issues such as missing cardiac walls, distorted or poorly positioned chambers, and other anomalies, providing interpretable feedback for clinical applications.falseDeep learning; Left ventricular echocardiography; Quality surveillance; Spatiotemporal feature disentanglementSpatiotemporal feature disentanglement for quality surveillance of left ventricular echocardiographic video using ST-R(2 + 1)D-ConvNeXtjournal article2-s2.0-85217650524