An Occam's Razor View on Learning Audiovisual Emotion Recognition with Small Training Sets

August 08, 2018 Β· Declared Dead Β· πŸ› International Conference on Multimodal Interaction

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Valentin Vielzeuf, Corentin Kervadec, StΓ©phane Pateux, Alexis Lechervy, FrΓ©dΓ©ric Jurie arXiv ID 1808.02668 Category cs.AI: Artificial Intelligence Cross-listed cs.CV, cs.NE, stat.ML Citations 51 Venue International Conference on Multimodal Interaction Last Checked 4 months ago
Abstract
This paper presents a light-weight and accurate deep neural model for audiovisual emotion recognition. To design this model, the authors followed a philosophy of simplicity, drastically limiting the number of parameters to learn from the target datasets, always choosing the simplest earning methods: i) transfer learning and low-dimensional space embedding allows to reduce the dimensionality of the representations. ii) The isual temporal information is handled by a simple score-per-frame selection process, averaged across time. iii) A simple frame selection echanism is also proposed to weight the images of a sequence. iv) The fusion of the different modalities is performed at prediction level (late usion). We also highlight the inherent challenges of the AFEW dataset and the difficulty of model selection with as few as 383 validation equences. The proposed real-time emotion classifier achieved a state-of-the-art accuracy of 60.64 % on the test set of AFEW, and ranked 4th at he Emotion in the Wild 2018 challenge.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Artificial Intelligence

Died the same way β€” πŸ‘» Ghosted