Where, What, Whether: Multi-modal Learning Meets Pedestrian Detection
December 20, 2020 Β· Declared Dead Β· π Computer Vision and Pattern Recognition
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Yan Luo, Chongyang Zhang, Muming Zhao, Hao Zhou, Jun Sun
arXiv ID
2012.10880
Category
cs.CV: Computer Vision
Citations
28
Venue
Computer Vision and Pattern Recognition
Last Checked
4 months ago
Abstract
Pedestrian detection benefits greatly from deep convolutional neural networks (CNNs). However, it is inherently hard for CNNs to handle situations in the presence of occlusion and scale variation. In this paper, we propose W$^3$Net, which attempts to address above challenges by decomposing the pedestrian detection task into \textbf{\textit{W}}here, \textbf{\textit{W}}hat and \textbf{\textit{W}}hether problem directing against pedestrian localization, scale prediction and classification correspondingly. Specifically, for a pedestrian instance, we formulate its feature by three steps. i) We generate a bird view map, which is naturally free from occlusion issues, and scan all points on it to look for suitable locations for each pedestrian instance. ii) Instead of utilizing pre-fixed anchors, we model the interdependency between depth and scale aiming at generating depth-guided scales at different locations for better matching instances of different sizes. iii) We learn a latent vector shared by both visual and corpus space, by which false positives with similar vertical structure but lacking human partial features would be filtered out. We achieve state-of-the-art results on widely used datasets (Citypersons and Caltech). In particular. when evaluating on heavy occlusion subset, our results reduce MR$^{-2}$ from 49.3$\%$ to 18.7$\%$ on Citypersons, and from 45.18$\%$ to 28.33$\%$ on Caltech.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Computer Vision
π
π
Old Age
π
π
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
π
π
Old Age
SSD: Single Shot MultiBox Detector
π
π
Old Age
Squeeze-and-Excitation Networks
π
π
Old Age
Fast R-CNN
π
π
Old Age
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted