Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection

Chen, Xuanjun; Wu, Haibin; Meng, Helen; HUNG-YI LEE; JYH-SHING JANG

doi:10.1109/SLT54892.2023.10022646

Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection

Journal

2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings

ISBN

9798350396904

Date Issued

2023-01-01

Author(s)

Chen, Xuanjun

Wu, Haibin

Meng, Helen

HUNG-YI LEE

JYH-SHING JANG

DOI

10.1109/SLT54892.2023.10022646

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/634354

URL

https://api.elsevier.com/content/abstract/scopus_id/85147797264

Abstract

Audio-visual active speaker detection (AVASD) is well-developed, and now is an indispensable front-end for several multi-modal applications. However, to the best of our knowledge, the adversarial robustness of AVASD models hasn't been investigated, not to mention the effective defense against such attacks. In this paper, we are the first to reveal the vulnerability of AVASD models under audio-only, visual-only, and audio-visual adversarial attacks through extensive experiments. What's more, we also propose a novel audio-visual interaction loss (AVIL) for making attackers difficult to find feasible adversarial examples under an allocated attack budget. The loss aims at pushing the inter-class embeddings to be dispersed, namely non-speech and speech clusters, sufficiently disentangled, and pulling the intra-class embeddings as close as possible to keep them compact. Experimental results show the AVIL outperforms the adversarial training by 33.14 mAP (%) under multi-modal attacks.

Subjects

adversarial robustness | Audio-visual active speaker detection | multi-modal adversarial attack

SDGs

[SDGs]SDG16

Type

conference paper

Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)