Switch-a-View: View Selection Learned from Unlabeled In-the-wild Videos

December 24, 2024 ยท Entered Twilight ยท + Add venue

๐Ÿ’ค TWILIGHT: Eternal Rest
Repo abandoned since publication

"No code URL or promise found in abstract"
"Code repo scraped from project page (backfill)"

Evidence collected by the PWNC Scanner

Repo contents: .dockerignore, .gitignore, CODE_OF_CONDUCT.md, CONTRIBUTING.md, Dockerfile, INSTALLATION.md, LICENSE, PanoIR, README.md, SoundSpaces2.md, configs, examples, res, scripts, setup.py, soundspaces, ss_baselines

Authors Sagnik Majumder, Tushar Nagarajan, Ziad Al-Halah, Kristen Grauman arXiv ID 2412.18386 Category cs.CV: Computer Vision Citations 0 Repository https://github.com/facebookresearch/sound-spaces โญ 447 Last Checked 2 months ago
Abstract
We introduce SWITCH-A-VIEW, a model that learns to automatically select the viewpoint to display at each timepoint when creating a how-to video. The key insight of our approach is how to train such a model from unlabeled -- but human-edited -- video samples. We pose a pretext task that pseudo-labels segments in the training videos for their primary viewpoint (egocentric or exocentric), and then discovers the patterns between the visual and spoken content in a how-to video on the one hand and its view-switch moments on the other hand. Armed with this predictor, our model can be applied to new multi-view video settings for orchestrating which viewpoint should be displayed when, even when such settings come with limited labels. We demonstrate our idea on a variety of real-world videos from HowTo100M and Ego-Exo4D, and rigorously validate its advantages. Project: https://vision.cs.utexas.edu/projects/switch_a_view/.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computer Vision

๐ŸŒ… ๐ŸŒ… Old Age

Fast R-CNN

Ross Girshick

cs.CV ๐Ÿ› ICCV ๐Ÿ“š 27.7K cites 11 years ago