Vectors of Locally Aggregated Centers for Compact Video Representation

September 13, 2015 · Declared Dead · 🏛 IEEE International Conference on Multimedia and Expo

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Alhabib Abbas, Nikos Deligiannis, Yiannis Andreopoulos arXiv ID 1509.03844 Category cs.MM: Multimedia Cross-listed cs.CV, cs.IR Citations 9 Venue IEEE International Conference on Multimedia and Expo Last Checked 3 months ago

Abstract

We propose a novel vector aggregation technique for compact video representation, with application in accurate similarity detection within large video datasets. The current state-of-the-art in visual search is formed by the vector of locally aggregated descriptors (VLAD) of Jegou et. al. VLAD generates compact video representations based on scale-invariant feature transform (SIFT) vectors (extracted per frame) and local feature centers computed over a training set. With the aim to increase robustness to visual distortions, we propose a new approach that operates at a coarser level in the feature representation. We create vectors of locally aggregated centers (VLAC) by first clustering SIFT features to obtain local feature centers (LFCs) and then encoding the latter with respect to given centers of local feature centers (CLFCs), extracted from a training set. The sum-of-differences between the LFCs and the CLFCs are aggregated to generate an extremely-compact video description used for accurate video segment similarity detection. Experimentation using a video dataset, comprising more than 1000 minutes of content from the Open Video Project, shows that VLAC obtains substantial gains in terms of mean Average Precision (mAP) against VLAD and the hyper-pooling method of Douze et. al., under the same compaction factor and the same set of distortions.