LLM-EvRep: Learning an LLM-Compatible Event Representation Using a Self-Supervised Framework

February 20, 2025 · Declared Dead · 🏛 The Web Conference

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Zongyou Yu, Qiang Qu, Qian Zhang, Nan Zhang, Xiaoming Chen arXiv ID 2502.14273 Category cs.CV: Computer Vision Cross-listed cs.AI, cs.MM Citations 5 Venue The Web Conference Last Checked 4 months ago

Abstract

Recent advancements in event-based recognition have demonstrated significant promise, yet most existing approaches rely on extensive training, limiting their adaptability for efficient processing of event-driven visual content. Meanwhile, large language models (LLMs) have exhibited remarkable zero-shot capabilities across diverse domains, but their application to event-based visual recognition remains largely unexplored. To bridge this gap, we propose \textbf{LLM-EvGen}, an event representation generator that produces LLM-compatible event representations \textbf{LLM-EvRep}, thereby enhancing the performance of LLMs on event recognition tasks. The generator is trained using a self-supervised framework, aligning the generated representations with semantic consistency and structural fidelity. Comprehensive experiments were conducted on three datasets: N-ImageNet, N-Caltech101, and N-MNIST. The results demonstrate that our method, \textbf{LLM-EvRep}, outperforms the event-to-video method, E2VID, by 15.93\%, 0.82\%, and 50.21\%, respectively, in recognition tasks when evaluated using GPT-4o.