BEV-Locator: An End-to-end Visual Semantic Localization Network Using Multi-View Images

November 27, 2022 · Declared Dead · 🏛 Science China Information Sciences

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Zhihuang Zhang, Meng Xu, Wenqiang Zhou, Tao Peng, Liang Li, Stefan Poslad arXiv ID 2211.14927 Category cs.CV: Computer Vision Cross-listed cs.AI Citations 32 Venue Science China Information Sciences Last Checked 4 months ago

Abstract

Accurate localization ability is fundamental in autonomous driving. Traditional visual localization frameworks approach the semantic map-matching problem with geometric models, which rely on complex parameter tuning and thus hinder large-scale deployment. In this paper, we propose BEV-Locator: an end-to-end visual semantic localization neural network using multi-view camera images. Specifically, a visual BEV (Birds-Eye-View) encoder extracts and flattens the multi-view images into BEV space. While the semantic map features are structurally embedded as map queries sequence. Then a cross-model transformer associates the BEV features and semantic map queries. The localization information of ego-car is recursively queried out by cross-attention modules. Finally, the ego pose can be inferred by decoding the transformer outputs. We evaluate the proposed method in large-scale nuScenes and Qcraft datasets. The experimental results show that the BEV-locator is capable to estimate the vehicle poses under versatile scenarios, which effectively associates the cross-model information from multi-view images and global semantic maps. The experiments report satisfactory accuracy with mean absolute errors of 0.052m, 0.135m and 0.251$^\circ$ in lateral, longitudinal translation and heading angle degree.