Speech Dereverberation Using Nonnegative Convolutive Transfer Function and Spectro temporal Modeling
September 16, 2017 ยท Declared Dead ยท ๐ IEEE/ACM Transactions on Audio Speech and Language Processing
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Nasser Mohammadiha, Simon Doclo
arXiv ID
1709.05557
Category
cs.SD: Sound
Cross-listed
cs.LG
Citations
33
Venue
IEEE/ACM Transactions on Audio Speech and Language Processing
Last Checked
2 months ago
Abstract
This paper presents two single channel speech dereverberation methods to enhance the quality of speech signals that have been recorded in an enclosed space. For both methods, the room acoustics are modeled using a nonnegative approximation of the convolutive transfer function (NCTF), and to additionally exploit the spectral properties of the speech signal, such as the low rank nature of the speech spectrogram, the speech spectrogram is modeled using nonnegative matrix factorization (NMF). Two methods are described to combine the NCTF and NMF models. In the first method, referred to as the integrated method, a cost function is constructed by directly integrating the speech NMF model into the NCTF model, while in the second method, referred to as the weighted method, the NCTF and NMF based cost functions are weighted and summed. Efficient update rules are derived to solve both optimization problems. In addition, an extension of the integrated method is presented, which exploits the temporal dependencies of the speech signal. Several experiments are performed on reverberant speech signals with and without background noise, where the integrated method yields a considerably higher speech quality than the baseline NCTF method and a state of the art spectral enhancement method. Moreover, the experimental results indicate that the weighted method can even lead to a better performance in terms of instrumental quality measures, but that the optimal weighting parameter depends on the room acoustics and the utilized NMF model. Modeling the temporal dependencies in the integrated method was found to be useful only for highly reverberant conditions.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Sound
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
CNN Architectures for Large-Scale Audio Classification
R.I.P.
๐ป
Ghosted
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
R.I.P.
๐ป
Ghosted
Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification
R.I.P.
๐ป
Ghosted
WaveGlow: A Flow-based Generative Network for Speech Synthesis
R.I.P.
๐ป
Ghosted
Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted