Sec2Sec Co-attention for Video-Based Apparent Affective Prediction

August 27, 2024 · Entered Twilight · + Add venue

Repo contents: README.md, Sec2Sec_Co-attention_Transformer.pdf, dataloader.py, layer.py, train.py, utils.py

Authors Mingwei Sun, Kunpeng Zhang arXiv ID 2408.15209 Category cs.MM: Multimedia Citations 0 Repository https://github.com/nestor-sun/sec2sec ⭐ 8 Last Checked 3 months ago

Abstract

Video-based apparent affect detection plays a crucial role in video understanding, as it encompasses various elements such as vision, audio, audio-visual interactions, and spatiotemporal information, which are essential for accurate video predictions. However, existing approaches often focus on extracting only a subset of these elements, resulting in the limited predictive capacity of their models. To address this limitation, we propose a novel LSTM-based network augmented with a Transformer co-attention mechanism for predicting apparent affect in videos. We demonstrate that our proposed Sec2Sec Co-attention Transformer surpasses multiple state-of-the-art methods in predicting apparent affect on two widely used datasets: LIRIS-ACCEDE and First Impressions. Notably, our model offers interpretability, allowing us to examine the contributions of different time points to the overall prediction. The implementation is available at: https://github.com/nestor-sun/sec2sec.