Leveraging Topics and Audio Features with Multimodal Attention for Audio Visual Scene-Aware Dialog

December 20, 2019 · Declared Dead · 🏛 ViGIL@NeurIPS

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Shachi H Kumar, Eda Okur, Saurav Sahay, Jonathan Huang, Lama Nachman arXiv ID 1912.10131 Category cs.MM: Multimedia Cross-listed cs.CL, cs.SD, eess.AS Citations 7 Venue ViGIL@NeurIPS Last Checked 3 months ago

Abstract

With the recent advancements in Artificial Intelligence (AI), Intelligent Virtual Assistants (IVA) such as Alexa, Google Home, etc., have become a ubiquitous part of many homes. Currently, such IVAs are mostly audio-based, but going forward, we are witnessing a confluence of vision, speech and dialog system technologies that are enabling the IVAs to learn audio-visual groundings of utterances. This will enable agents to have conversations with users about the objects, activities and events surrounding them. In this work, we present three main architectural explorations for the Audio Visual Scene-Aware Dialog (AVSD): 1) investigating `topics' of the dialog as an important contextual feature for the conversation, 2) exploring several multimodal attention mechanisms during response generation, 3) incorporating an end-to-end audio classification ConvNet, AclNet, into our architecture. We discuss detailed analysis of the experimental results and show that our model variations outperform the baseline system presented for the AVSD task.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Multimedia

🌅 🌅 Old Age

Quality Assessment of In-the-Wild Videos

Dingquan Li, Tingting Jiang, Ming Jiang

cs.MM 🏛 ACM MM 📚 375 cites 6 years ago

R.I.P. 👻 Ghosted

Viewport-Adaptive Navigable 360-Degree Video Delivery

Xavier Corbillon, Gwendal Simon, ... (+2 more)

cs.MM 🏛 ICC 📚 328 cites 9 years ago

📚 📚 The Cartographer

A Comprehensive Survey on Cross-modal Retrieval

Kaiye Wang, Qiyue Yin, ... (+3 more)

cs.MM 🏛 arXiv 📚 322 cites 9 years ago

📚 📚 The Cartographer

An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges

Yuxin Peng, Xin Huang, Yunzhen Zhao

cs.MM 🏛 IEEE TCSVT 📚 309 cites 9 years ago

R.I.P. 👻 Ghosted

A Convolutional Neural Network Approach for Post-Processing in HEVC Intra Coding

Yuanying Dai, Dong Liu, Feng Wu

cs.MM 🏛 ICMM 📚 305 cites 9 years ago

R.I.P. 👻 Ghosted

Video Generation From Text

Yitong Li, Martin Renqiang Min, ... (+3 more)

cs.MM 🏛 AAAI 📚 300 cites 8 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 8 years ago