Accidental Turntables: Learning 3D Pose by Watching Objects Turn

December 13, 2022 · Declared Dead · 🏛 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Zezhou Cheng, Matheus Gadelha, Subhransu Maji arXiv ID 2212.06300 Category cs.CV: Computer Vision Citations 2 Venue 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Last Checked 4 months ago

Abstract

We propose a technique for learning single-view 3D object pose estimation models by utilizing a new source of data -- in-the-wild videos where objects turn. Such videos are prevalent in practice (e.g., cars in roundabouts, airplanes near runways) and easy to collect. We show that classical structure-from-motion algorithms, coupled with the recent advances in instance detection and feature matching, provides surprisingly accurate relative 3D pose estimation on such videos. We propose a multi-stage training scheme that first learns a canonical pose across a collection of videos and then supervises a model for single-view pose estimation. The proposed technique achieves competitive performance with respect to existing state-of-the-art on standard benchmarks for 3D pose estimation, without requiring any pose labels during training. We also contribute an Accidental Turntables Dataset, containing a challenging set of 41,212 images of cars in cluttered backgrounds, motion blur and illumination changes that serves as a benchmark for 3D pose estimation.