Improved Speech Enhancement Approaches Based on Subspace Concept and Spectral Subtraction
Date Issued
2006
Date
2006
Author(s)
Ju, Gwo-Hwa
DOI
en-US
Abstract
Noise suppression to enhance speech quality or intelligibility is necessary in a wide range of applications including mobile communication, hearing aids and speech recognition. In this dissertation, we propose several improved single-channel speech enhancement approaches based on subspace concept and spectral subtraction for improving sound quality and intelligibility and increasing speech recognition robustness for the case of speech corrupted by additive noise.
Spectral subtraction (SS) is the most popular approach among the various speech enhancement approaches, which is simple, effective and easy to implement. In this dissertation, we propose two improved versions of the SS approach for speech enhancement in Chapter 3. First the silence-fractional histogram equalization process is performed as an additionally flooring process stage for SS to improve the speech quality. Furthermore, it is evident that the SS algorithm can offer significant performance improvements to slow-varying, broad-band additive noise, but become less helpful when the noise is narrow-band and/or non-stationary. Therefore we propose to integrate the sub-band coding (SBC) and SS in Chapter 3, in which we use the SBC to split the frequency domain of input speech signal into several overlapped frequency bands and extend each band to fit the full frequency scale by decimation. The spectrum of the additive noise in each frequency band obtained in this way can then be better approximated as white if the number of bands is large enough, and therefore SS can be more effective.
In Chapter 4, we introduce a subspace-based approach for speech enhancement. We propose a generalized singular value decomposition (GSVD)-based approach, an extended version of the previously proposed truncated quotient SVD (QSVD)-based approach, in which more flexible and precise determination of the dimensions of the signal and noise subspaces became possible for each frame of the noisy signal using well-defined procedures. In this subspace-based approach, we properly partition the vector space of every input speech frame into signal and noise subspaces. It assumes that the speech is presented only in the signal subspace, whereas the corrupting noise spans both the signal and noise subspaces. We can thus discard the noise subspace components and reconstruct the speech from those of the signal subspace only. This approach is very effective whether the additive noise is white or not.
Though the GSVD-based approach has been shown to be effective, however some unnatural sounding characteristics, usually due to the perceivable residual noise, still occur in the estimated speech under adverse environment. To solve this problem, we integrate the auditory masking thresholds (AMTs) in human auditory functions into the GSVD-based approach to establish an improved framework for speech enhancement in Chapter 5. We proposed to restrict the spectral energy of every residual noise component below the corresponding AMTs, thus the noise can be masked and not perceivable.
Experiments show that the subspace-based approach proposed in Chapters 4 and 5 behaves well regardless of whether the additive noise is stationary or not, especially when it is non-white. However, high computational complexity of such a subspace-based approach makes it hard to be practically applied to the real-world environments. In Chapter 6, we further develop a new speech enhancement framework, in which the time-shift property of DFT (Discrete Fourier Transform) is applied to the special structure of Hankel-form matrices (constructed from the noisy speech frames and estimated noise signal) to replace the time-consuming GSVD algorithm used in the previous two chapters, such that the required computation load of this new approach can be as low as that of the conventional SS algorithm, but offer comparable performance to that of the previously proposed subspace-based approaches in this dissertation.
Finally, we conclude this thesis in Chapter 7. We also address several issues for future research directions.
Subjects
語音強化技術
訊號子空間
雜訊子空間
頻譜消去法
Speech Enhancement
Signal Subspace
Noise Subspace
Spectral Subtraction
SDGs
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-95-D89942008-1.pdf
Size
23.31 KB
Format
Adobe PDF
Checksum
(MD5):de881b16ed8fd993150095255ae433c4