ToxicTone: A Mandarin Audio Dataset Annotated for Toxicity and Toxic Utterance Tonality
Journal
Interspeech 2025
Series/Report No.
Proceedings of the Annual Conference of the International Speech Communication Association Interspeech
Start Page
4008
End Page
4012
ISSN
2308457X
Date Issued
2025-08-17
Author(s)
Luo, Yu-Xiang
Lin, Yi-Cheng
Chuang, Ming-To
Chen, Jia-Hung
Tsai, I-Ning
Kiew, Pei Xing
Huang, Yueh-Hsuan
Liu, Chien-Feng
Chen, Yu-Chen
Feng, Bo-Han
Ren, Wenze
Abstract
Despite extensive research on toxic speech detection in text, a critical gap remains in handling spoken Mandarin audio. The lack of annotated datasets that capture the unique prosodic cues and culturally specific expressions in Mandarin leaves spoken toxicity underexplored. To address this, we introduce ToxicTone-the largest public dataset of its kind-featuring detailed annotations that distinguish both forms of toxicity (e.g., profanity, bullying) and sources of toxicity (e.g., anger, sarcasm, dismissiveness). Our data, sourced from diverse real-world audio and organized into 13 topical categories, mirrors authentic communication scenarios. We also propose a multimodal detection framework that integrates acoustic, linguistic, and emotional features using state-of-the-art speech and emotion encoders. Extensive experiments show our approach outperforms text-only and baseline models, underscoring the essential role of speech-specific cues in revealing hidden toxic expressions.
Event(s)
26th Interspeech Conference 2025
Subjects
Annotation
Ensemble
Mandarin Chinese
Toxicity detection
Publisher
ISCA
Type
conference paper
