Universal Speech Enhancement with Regression and Generative Mamba

Chao, RongRongChaoNasretdinov, RaufRaufNasretdinovYU-CHIANG WANGJukic, AnteAnteJukicFu, Szu-WeiSzu-WeiFuTsao, YuYuTsao2025-11-262025-11-262025-08-172308457Xhttps://www.scopus.com/record/display.uri?eid=2-s2.0-105020063136&origin=resultslisthttps://scholars.lib.ntu.edu.tw/handle/123456789/734156The Interspeech 2025 URGENT Challenge aimed to advance universal, robust, and generalizable speech enhancement by unifying speech enhancement tasks across a wide variety of conditions, including seven different distortion types and five languages. We present Universal Speech Enhancement Mamba (USEMamba), a state-space speech enhancement model designed to handle long-range sequence modeling, time-frequency structured processing, and sampling frequency-independent feature extraction. Our approach primarily relies on regression-based modeling, which performs well across most distortions. However, for packet loss and bandwidth extension, where missing content must be inferred, a generative variant of the proposed USEMamba proves more effective. Despite being trained in only a subset of the full training data, USEMamba achieved 2nd place in Track 1 during the blind test phase, demonstrating strong generalization across diverse conditions.falsespeech restorationstate-space modelsuniversal speech enhancementURGENT 2025 Challenge[SDGs]SDG5[SDGs]SDG10Universal Speech Enhancement with Regression and Generative Mambaconference paper10.21437/interspeech.2025-9002-s2.0-105020063136