Supervector Compression Strategies to Speed up I-Vector System Development

May 03, 2018 · Declared Dead · 🏛 The Speaker and Language Recognition Workshop

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Ville Vestman, Tomi Kinnunen arXiv ID 1805.01156 Category eess.AS: Audio & Speech Cross-listed cs.CL, cs.LG, cs.SD, stat.ML Citations 4 Venue The Speaker and Language Recognition Workshop Last Checked 3 months ago

Abstract

The front-end factor analysis (FEFA), an extension of principal component analysis (PPCA) tailored to be used with Gaussian mixture models (GMMs), is currently the prevalent approach to extract compact utterance-level features (i-vectors) for automatic speaker verification (ASV) systems. Little research has been conducted comparing FEFA to the conventional PPCA applied to maximum a posteriori (MAP) adapted GMM supervectors. We study several alternative methods, including PPCA, factor analysis (FA), and two supervised approaches, supervised PPCA (SPPCA) and the recently proposed probabilistic partial least squares (PPLS), to compress MAP-adapted GMM supervectors. The resulting i-vectors are used in ASV tasks with a probabilistic linear discriminant analysis (PLDA) back-end. We experiment on two different datasets, on the telephone condition of NIST SRE 2010 and on the recent VoxCeleb corpus collected from YouTube videos containing celebrity interviews recorded in various acoustical and technical conditions. The results suggest that, in terms of ASV accuracy, the supervector compression approaches are on a par with FEFA. The supervised approaches did not result in improved performance. In comparison to FEFA, we obtained more than hundred-fold (100x) speedups in the total variability model (TVM) training using the PPCA and FA supervector compression approaches.