Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers

Lin, Tzu-Quan; Cheng, Hsi-Chun; Lee, Hung-yi; Tang, Hao

doi:10.1109/apsipaasc65261.2025.11249191

Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers

Journal

2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2025

Start Page

525

End Page

530

ISBN (of the container)

979-833157206-8

Date Issued

2025-11-28

Author(s)

Lin, Tzu-Quan

Cheng, Hsi-Chun

Lee, Hung-yi

Tang, Hao

DOI

10.1109/apsipaasc65261.2025.11249191

URI

https://www.scopus.com/record/display.uri?eid=2-s2.0-105030485824&origin=resultslist

https://scholars.lib.ntu.edu.tw/handle/123456789/737502

Abstract

In recent years, the impact of self-supervised speech Transformers has extended to speaker-related applications. However, little research has explored how these models encode speaker information. In this work, we address this gap by identifying neurons in the feed-forward layers that are correlated with speaker information. Specifically, we analyze neurons associated with k-means clusters of self-supervised features and i-vectors. Our analysis reveals that these clusters correspond to broad phonetic and gender classes, making them suitable for identifying neurons that represent speakers. By protecting these neurons during pruning, we can significantly preserve performance on speaker-related task, demonstrating their crucial role in encoding speaker information.

Event(s)

17th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2025

Publisher

IEEE

Type

conference paper

Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)