DYNAMIC-SUPERB PHASE-2: A COLLABORATIVELY EXPANDING BENCHMARK FOR MEASURING THE CAPABILITIES OF SPOKEN LANGUAGE MODELS WITH 180 TASKS

Huang, Chien-Yu; HUNG-YI LEE; YU-HUA CHEN; et al.

DYNAMIC-SUPERB PHASE-2: A COLLABORATIVELY EXPANDING BENCHMARK FOR MEASURING THE CAPABILITIES OF SPOKEN LANGUAGE MODELS WITH 180 TASKS

Journal

13th International Conference on Learning Representations Iclr 2025

Part Of

13th International Conference on Learning Representations, ICLR 2025

ISBN

9798331320850

Date Issued

2025-01-01

Author(s)

Huang, Chien-Yu

HUNG-YI LEE

YU-HUA CHEN

et al.

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/732836

https://www.scopus.com/record/display.uri?eid=2-s2.0-105010275248&origin=resultslist

Abstract

Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication gaps and facilitating more intuitive interactions. However, the absence of a comprehensive evaluation benchmark poses a significant challenge. We present Dynamic-SUPERB Phase-2, an open and evolving benchmark for the comprehensive evaluation of instruction-based universal speech models. Building upon the first generation, this second version incorporates 125 new tasks contributed collaboratively by the global research community, expanding the benchmark to a total of 180 tasks, making it the largest benchmark for speech and audio evaluation. While the first generation of Dynamic-SUPERB was limited to classification tasks, Dynamic-SUPERB Phase-2 broadens its evaluation capabilities by introducing a wide array of novel and diverse tasks, including regression and sequence generation, across speech, music, and environmental audio. Evaluation results show that no model performed well universally. SALMONN-13B excelled in English ASR and Qwen2-Audio-7B-Instruct showed high accuracy in emotion recognition, but current models still require further innovations to handle a broader range of tasks. We open-source all task data and the evaluation pipeline at https://github.com/dynamic-superb/dynamic-superb.

Event(s)

13th International Conference on Learning Representations, ICLR 2025

Type

conference paper

DYNAMIC-SUPERB PHASE-2: A COLLABORATIVELY EXPANDING BENCHMARK FOR MEASURING THE CAPABILITIES OF SPOKEN LANGUAGE MODELS WITH 180 TASKS

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)