AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis

February 04, 2023 · Declared Dead · 🏛 Neural Information Processing Systems

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu arXiv ID 2302.02088 Category cs.CV: Computer Vision Cross-listed cs.GR, cs.SD, eess.AS Citations 61 Venue Neural Information Processing Systems Last Checked 3 months ago

Abstract

Can machines recording an audio-visual scene produce realistic, matching audio-visual experiences at novel positions and novel view directions? We answer it by studying a new task -- real-world audio-visual scene synthesis -- and a first-of-its-kind NeRF-based approach for multimodal learning. Concretely, given a video recording of an audio-visual scene, the task is to synthesize new videos with spatial audios along arbitrary novel camera trajectories in that scene. We propose an acoustic-aware audio generation module that integrates prior knowledge of audio propagation into NeRF, in which we implicitly associate audio generation with the 3D geometry and material properties of a visual environment. Furthermore, we present a coordinate transformation module that expresses a view direction relative to the sound source, enabling the model to learn sound source-centric acoustic fields. To facilitate the study of this new task, we collect a high-quality Real-World Audio-Visual Scene (RWAVS) dataset. We demonstrate the advantages of our method on this real-world dataset and the simulation-based SoundSpaces dataset.