Lightly Weighted Automatic Audio Parameter Extraction for the Quality Assessment of Consensus Auditory-Perceptual Evaluation of Voice

Lin, Yi Heng; Tseng, Wen Hsuan; Chen, Li Chin; Tan, Ching Ting; Tsao, Yu

doi:10.1109/ICCE59016.2024.10444177

Lightly Weighted Automatic Audio Parameter Extraction for the Quality Assessment of Consensus Auditory-Perceptual Evaluation of Voice

Journal

Digest of Technical Papers - IEEE International Conference on Consumer Electronics

ISBN

9798350324136

Date Issued

2024-01-01

Author(s)

Lin, Yi Heng

Tseng, Wen Hsuan

Chen, Li Chin

Tan, Ching Ting

Tsao, Yu

DOI

10.1109/ICCE59016.2024.10444177

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/641272

URL

https://api.elsevier.com/content/abstract/scopus_id/85187012334

Abstract

The Consensus Auditory-Perceptual Evaluation of Voice is a widely employed tool in clinical voice quality assessment that is significant for streaming communication among clinical professionals and benchmarking for the determination of further treatment. Currently, because the assessment relies on experienced clinicians, it tends to be inconsistent, and thus, difficult to standardize. To address this problem, we propose to leverage lightly weighted automatic audio parameter extraction, to increase the clinical relevance, reduce the complexity, and enhance the interpretability of voice quality assessment. The proposed method utilizes age, sex, and five audio parameters: jitter, absolute jitter, shimmer, harmonic-To-noise ratio (HNR), and zero crossing. A classical machine learning approach is employed. The result reveals that our approach performs similar to state-of-The-Art (SOTA) methods, and outperforms the latent representation obtained by using popular audio pre-Trained models. This approach provide insights into the feasibility of different feature extraction approaches for voice evaluation. Audio parameters such as jitter and the HNR are proven to be suitable for characterizing voice quality attributes, such as roughness and strain. Conversely, pre-Trained models exhibit limitations in effectively addressing noise-related scorings. This study contributes toward more comprehensive and precise voice quality evaluations, achieved by a comprehensively exploring diverse assessment methodologies.

Subjects

audio feature extraction | Consensus auditory-perceptual evaluation of voice | pre-Trained model | voice evaluation | voice quality

Type

conference paper

Lightly Weighted Automatic Audio Parameter Extraction for the Quality Assessment of Consensus Auditory-Perceptual Evaluation of Voice

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)