Learning to Separate Object Sounds by Watching Unlabeled Video
April 05, 2018 ยท Entered Twilight ยท ๐ European Conference on Computer Vision
"No code URL or promise found in abstract"
"Code repo scraped from project page (backfill)"
Evidence collected by the PWNC Scanner
Repo contents: README.md, data, models, options, test.py, train.py, util
Authors
Ruohan Gao, Rogerio Feris, Kristen Grauman
arXiv ID
1804.01665
Category
cs.CV: Computer Vision
Cross-listed
cs.MM,
cs.SD,
eess.AS
Citations
295
Venue
European Conference on Computer Vision
Repository
https://github.com/rhgao/separating-object-sounds
โญ 50
Last Checked
1 month ago
Abstract
Perceiving a scene most fully requires all the senses. Yet modeling how objects look and sound is challenging: most natural scenes and events contain multiple objects, and the audio track mixes all the sound sources together. We propose to learn audio-visual object models from unlabeled video, then exploit the visual context to perform audio source separation in novel videos. Our approach relies on a deep multi-instance multi-label learning framework to disentangle the audio frequency bases that map to individual visual objects, even without observing/hearing those objects in isolation. We show how the recovered disentangled bases can be used to guide audio source separation to obtain better-separated, object-level sounds. Our work is the first to learn audio source separation from large-scale "in the wild" videos containing multiple audio sources per video. We obtain state-of-the-art results on visually-aided audio source separation and audio denoising. Our video results: http://vision.cs.utexas.edu/projects/separating_object_sounds/
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted