Spoof detection using time-delay shallow neural network and feature switching

April 16, 2019 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Mari Ganesh Kumar, Suvidha Rupesh Kumar, Saranya M, B. Bharathi, Hema A. Murthy arXiv ID 1904.07453 Category eess.AS: Audio & Speech Cross-listed cs.CR, cs.LG, cs.SD Citations 2 Venue arXiv.org Last Checked 3 months ago

Abstract

Detecting spoofed utterances is a fundamental problem in voice-based biometrics. Spoofing can be performed either by logical accesses like speech synthesis, voice conversion or by physical accesses such as replaying the pre-recorded utterance. Inspired by the state-of-the-art \emph{x}-vector based speaker verification approach, this paper proposes a time-delay shallow neural network (TD-SNN) for spoof detection for both logical and physical access. The novelty of the proposed TD-SNN system vis-a-vis conventional DNN systems is that it can handle variable length utterances during testing. Performance of the proposed TD-SNN systems and the baseline Gaussian mixture models (GMMs) is analyzed on the ASV-spoof-2019 dataset. The performance of the systems is measured in terms of the minimum normalized tandem detection cost function (min-t-DCF). When studied with individual features, the TD-SNN system consistently outperforms the GMM system for physical access. For logical access, GMM surpasses TD-SNN systems for certain individual features. When combined with the decision-level feature switching (DLFS) paradigm, the best TD-SNN system outperforms the best baseline GMM system on evaluation data with a relative improvement of 48.03\% and 49.47\% for both logical and physical access, respectively.