https://scholars.lib.ntu.edu.tw/handle/123456789/122558
Title: | 基於子空間觀念及頻譜消去法的進一步語音強化技術 Improved Speech Enhancement Approaches Based on Subspace Concept and Spectral Subtraction |
Authors: | 朱國華 Ju, Gwo-Hwa |
Keywords: | 語音強化技術;訊號子空間;雜訊子空間;頻譜消去法;Speech Enhancement;Signal Subspace;Noise Subspace;Spectral Subtraction | Issue Date: | 2006 | Abstract: | 由無線通訊、助聽設備到語音辨識應用等各種不同的領域,雜訊衰減技術已被廣泛應用以達到提升語音系統效能的目的。針對各種不同種類及特性的外加雜訊源,本論文提出數種基於子空間觀念及頻譜消去法的進一步語音強化技術,以期提升雜訊語音的品質與可理解度及增進辨識系統的強健性。 語音強化演算法領域中最被人所熟悉的就是頻譜消去法 (Spectral Subtraction, SS),它具有簡單、易於實現等優點。我們於第三章中提出以統計圖等化法 (Histogram Equalization) 來取代SS中的底限處理以降低頻譜過度消減時信號失真程度增加的問題。此外我們針對SS在外加雜訊源非白雜訊時效能會退化的缺點加以改善;我們將其與分頻編碼 (Sub-Band Coding) 法結合,可使得每一次頻帶內的外加雜訊源經由次取樣 (Decimation) 處理後可近似白雜訊的特性,如此一來SS演算法就可有效降低每個次頻帶內的雜訊成分,進而提升SS於非白雜訊環境下的處理效能。 第四章中我們利用目前非常盛行的子空間觀念進行語音強化處理實驗。我們使用廣義奇異值分解法 (Generalized Singular Value Decomposition,GSVD) 將由各雜訊語音框所建構漢可 (Hankle) 資料矩陣的向量空間劃分成不相交的訊號子空間及雜訊子空間。訊號子空間內包含乾淨語音成分及雜訊源成分,雜訊子空間成分則完全由雜訊源所組成。因此乾淨語音訊號可由訊號子空間中估測出。此GSVD子空間法乃引申自前人所發表之Truncated Quotient SVD法,該語音強化演算法需根據訊噪比大小,藉由經驗法則估測出每一雜訊語音框的信號子空間的維度,而本論文中所提出的GSVD法可自動及精確地計算出信號及雜訊子空間的維度。 前述使用的子空間法雖然可有效提升系統效能,但在低訊噪比環境下,釵h聽起來不甚自然的剩餘雜訊成分會存在於強化語音信號中。為了消除此一現象因此我們於第五章中嘗試將人耳聽覺遮蔽特性應用於GSVD子空間法上 (PCGSVD),期使剩餘雜訊頻譜能量低於聽覺遮蔽臨界值 (Auditory Masking Thresholds) 而不致被人耳所查覺,進一步達到提升聲音品質與可理解度及增進辨識系統正確率的目的。 由於利用PCGSVD子空間法所需的計算量非常大,因此在第六章中我們引用其觀念,將離散傅氏轉換法及其時域平移特性應用於漢可資料矩陣上,在頻域上提出另外一種結合聽覺遮蔽效應與子空間法的語音強化架構。此新演算法所需計算量與傳統SS相當,但其效能接近於前述PCGSVD子空間法。 最後於第七章對本論文作一總結,並提出未來幾個可能的研究方向。 Noise suppression to enhance speech quality or intelligibility is necessary in a wide range of applications including mobile communication, hearing aids and speech recognition. In this dissertation, we propose several improved single-channel speech enhancement approaches based on subspace concept and spectral subtraction for improving sound quality and intelligibility and increasing speech recognition robustness for the case of speech corrupted by additive noise. Spectral subtraction (SS) is the most popular approach among the various speech enhancement approaches, which is simple, effective and easy to implement. In this dissertation, we propose two improved versions of the SS approach for speech enhancement in Chapter 3. First the silence-fractional histogram equalization process is performed as an additionally flooring process stage for SS to improve the speech quality. Furthermore, it is evident that the SS algorithm can offer significant performance improvements to slow-varying, broad-band additive noise, but become less helpful when the noise is narrow-band and/or non-stationary. Therefore we propose to integrate the sub-band coding (SBC) and SS in Chapter 3, in which we use the SBC to split the frequency domain of input speech signal into several overlapped frequency bands and extend each band to fit the full frequency scale by decimation. The spectrum of the additive noise in each frequency band obtained in this way can then be better approximated as white if the number of bands is large enough, and therefore SS can be more effective. In Chapter 4, we introduce a subspace-based approach for speech enhancement. We propose a generalized singular value decomposition (GSVD)-based approach, an extended version of the previously proposed truncated quotient SVD (QSVD)-based approach, in which more flexible and precise determination of the dimensions of the signal and noise subspaces became possible for each frame of the noisy signal using well-defined procedures. In this subspace-based approach, we properly partition the vector space of every input speech frame into signal and noise subspaces. It assumes that the speech is presented only in the signal subspace, whereas the corrupting noise spans both the signal and noise subspaces. We can thus discard the noise subspace components and reconstruct the speech from those of the signal subspace only. This approach is very effective whether the additive noise is white or not. Though the GSVD-based approach has been shown to be effective, however some unnatural sounding characteristics, usually due to the perceivable residual noise, still occur in the estimated speech under adverse environment. To solve this problem, we integrate the auditory masking thresholds (AMTs) in human auditory functions into the GSVD-based approach to establish an improved framework for speech enhancement in Chapter 5. We proposed to restrict the spectral energy of every residual noise component below the corresponding AMTs, thus the noise can be masked and not perceivable. Experiments show that the subspace-based approach proposed in Chapters 4 and 5 behaves well regardless of whether the additive noise is stationary or not, especially when it is non-white. However, high computational complexity of such a subspace-based approach makes it hard to be practically applied to the real-world environments. In Chapter 6, we further develop a new speech enhancement framework, in which the time-shift property of DFT (Discrete Fourier Transform) is applied to the special structure of Hankel-form matrices (constructed from the noisy speech frames and estimated noise signal) to replace the time-consuming GSVD algorithm used in the previous two chapters, such that the required computation load of this new approach can be as low as that of the conventional SS algorithm, but offer comparable performance to that of the previously proposed subspace-based approaches in this dissertation. Finally, we conclude this thesis in Chapter 7. We also address several issues for future research directions. |
URI: | http://ntur.lib.ntu.edu.tw//handle/246246/58922 | Other Identifiers: | en-US | SDG/Keyword: | [SDGs]SDG3 |
Appears in Collections: | 電信工程學研究所 |
File | Description | Size | Format | |
---|---|---|---|---|
ntu-95-D89942008-1.pdf | 23.31 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.