Interact with me: Joint Egocentric Forecasting of Intent to Interact, Attitude and Social Actions

December 21, 2024 · Declared Dead · 🏛 IEEE International Conference on Multimedia and Expo

⏳ CAUSE OF DEATH: Coming Soon™
Promised but never delivered

"Paper promises code 'coming soon'"

Evidence collected by the PWNC Scanner

Authors Tongfei Bian, Yiming Ma, Mathieu Chollet, Victor Sanchez, Tanaya Guha arXiv ID 2412.16698 Category cs.CV: Computer Vision Cross-listed cs.HC Citations 2 Venue IEEE International Conference on Multimedia and Expo Last Checked 1 month ago
Abstract
For efficient human-agent interaction, an agent should proactively recognize their target user and prepare for upcoming interactions. We formulate this challenging problem as the novel task of jointly forecasting a person's intent to interact with the agent, their attitude towards the agent and the action they will perform, from the agent's (egocentric) perspective. So we propose \emph{SocialEgoNet} - a graph-based spatiotemporal framework that exploits task dependencies through a hierarchical multitask learning approach. SocialEgoNet uses whole-body skeletons (keypoints from face, hands and body) extracted from only 1 second of video input for high inference speed. For evaluation, we augment an existing egocentric human-agent interaction dataset with new class labels and bounding box annotations. Extensive experiments on this augmented dataset, named JPL-Social, demonstrate \emph{real-time} inference and superior performance (average accuracy across all tasks: 83.15\%) of our model outperforming several competitive baselines. The additional annotations and code will be available upon acceptance.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Computer Vision

Died the same way — ⏳ Coming Soon™