HARDER: 3D human avatar reconstruction with distillation and explicit representation

Yu, Chun-HauChun-HauYuChen, Yu-HsiangYu-HsiangChenYu, Cheng-YenCheng-YenYuFu, Li-ChenLi-ChenFu2026-01-272026-01-272026-0200978493https://www.scopus.com/record/display.uri?eid=2-s2.0-105025728132&origin=resultslisthttps://scholars.lib.ntu.edu.tw/handle/123456789/7356213D human avatar reconstruction has become a popular research field in recent years. Although many studies have shown remarkable results, most existing methods either impose overly strict data requirements, such as depth information or multi-view images, or suffer from significant performance drops in specific areas. To address these challenges, we propose HARDER. We combine the Score Distillation Sampling (SDS) technique with the designed modules, Feature-Specific Image Captioning (FSIC) and RADR (Region-Aware Differentiable Rendering), allowing the Latent Diffusion Model (LDM) to guide the reconstruction process, especially in unseen regions. Furthermore, we have developed various training strategies, including personalized LDM, delayed SDS, focused SDS, and multi-pose SDS, to make the training process more efficient.Our avatars use an explicit representation that is compatible with modern computer graphics pipelines. Also, the entire reconstruction and real-time animation process can be completed on a single consumer-grade GPU, making this application more accessible.false3D human reconstructionAvatarLatent diffusion modelsScore distillation samplingHARDER: 3D human avatar reconstruction with distillation and explicit representationjournal article10.1016/j.cag.2025.1045122-s2.0-1050257281322-s2.0-105025728132