OLiVia-Nav: An Online Lifelong Vision Language Approach for Mobile Robot Social Navigation

September 20, 2024 · Declared Dead · 🏛 IEEE International Conference on Robotics and Automation

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Siddarth Narasimhan, Aaron Hao Tan, Daniel Choi, Goldie Nejat arXiv ID 2409.13675 Category cs.RO: Robotics Citations 14 Venue IEEE International Conference on Robotics and Automation Last Checked 4 months ago

Abstract

Service robots in human-centered environments such as hospitals, office buildings, and long-term care homes need to navigate while adhering to social norms to ensure the safety and comfortability of the people they are sharing the space with. Furthermore, they need to adapt to new social scenarios that can arise during robot navigation. In this paper, we present a novel Online Lifelong Vision Language architecture, OLiVia- Nav, which uniquely integrates vision-language models (VLMs) with an online lifelong learning framework for robot social navigation. We introduce a unique distillation approach, Social Context Contrastive Language Image Pre-training (SC-CLIP), to transfer the social reasoning capabilities of large VLMs to a lightweight VLM, in order for OLiVia-Nav to directly encode social and environment context during robot navigation. These encoded embeddings are used to generate and select robot social compliant trajectories. The lifelong learning capabilities of SC-CLIP enable OLiVia-Nav to update the robot trajectory planning overtime as new social scenarios are encountered. We conducted extensive real-world experiments in diverse social navigation scenarios. The results showed that OLiVia-Nav outperformed existing state-of-the-art DRL and VLM methods in terms of mean squared error, Hausdorff loss, and personal space violation duration. Ablation studies also verified the design choices for OLiVia-Nav.