Full-info Training for Deep Speaker Feature Learning

October 31, 2017 ยท Declared Dead ยท ๐Ÿ› IEEE International Conference on Acoustics, Speech, and Signal Processing

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Lantian Li, Zhiyuan Tang, Dong Wang, Thomas Fang Zheng arXiv ID 1711.00366 Category cs.SD: Sound Cross-listed cs.LG, eess.AS Citations 13 Venue IEEE International Conference on Acoustics, Speech, and Signal Processing Last Checked 3 months ago
Abstract
In recent studies, it has shown that speaker patterns can be learned from very short speech segments (e.g., 0.3 seconds) by a carefully designed convolutional & time-delay deep neural network (CT-DNN) model. By enforcing the model to discriminate the speakers in the training data, frame-level speaker features can be derived from the last hidden layer. In spite of its good performance, a potential problem of the present model is that it involves a parametric classifier, i.e., the last affine layer, which may consume some discriminative knowledge, thus leading to `information leak' for the feature learning. This paper presents a full-info training approach that discards the parametric classifier and enforces all the discriminative knowledge learned by the feature net. Our experiments on the Fisher database demonstrate that this new training scheme can produce more coherent features, leading to consistent and notable performance improvement on the speaker verification task.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Sound

Died the same way โ€” ๐Ÿ‘ป Ghosted