Modality Distillation with Multiple Stream Networks for Action Recognition
June 19, 2018 ยท Entered Twilight ยท ๐ European Conference on Computer Vision
"Last commit was 6.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: README.md, codebase, nets, s1_train_stream.py, s2_twostream_depth_rgb.py, s3_distillation.py, s4_depth_hall.py, utils
Authors
Nuno Garcia, Pietro Morerio, Vittorio Murino
arXiv ID
1806.07110
Category
cs.CV: Computer Vision
Citations
202
Venue
European Conference on Computer Vision
Repository
https://github.com/ncgarcia/modality-distillation
โญ 19
Last Checked
2 months ago
Abstract
Diverse input data modalities can provide complementary cues for several tasks, usually leading to more robust algorithms and better performance. However, while a (training) dataset could be accurately designed to include a variety of sensory inputs, it is often the case that not all modalities could be available in real life (testing) scenarios, where a model has to be deployed. This raises the challenge of how to learn robust representations leveraging multimodal data in the training stage, while considering limitations at test time, such as noisy or missing modalities. This paper presents a new approach for multimodal video action recognition, developed within the unified frameworks of distillation and privileged information, named generalized distillation. Particularly, we consider the case of learning representations from depth and RGB videos, while relying on RGB data only at test time. We propose a new approach to train an hallucination network that learns to distill depth features through multiplicative connections of spatiotemporal representations, leveraging soft labels and hard labels, as well as distance between feature maps. We report state-of-the-art results on video action classification on the largest multimodal dataset available for this task, the NTU RGB+D. Code available at https://github.com/ncgarcia/modality-distillation .
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted