Action-conditioned Benchmarking of Robotic Video Prediction Models: a Comparative Study
October 07, 2019 Β· Declared Dead Β· π IEEE International Conference on Robotics and Automation
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Manuel Serra Nunes, Atabak Dehban, Plinio Moreno, JosΓ© Santos-Victor
arXiv ID
1910.02564
Category
cs.CV: Computer Vision
Cross-listed
cs.RO,
eess.IV
Citations
13
Venue
IEEE International Conference on Robotics and Automation
Last Checked
4 months ago
Abstract
A defining characteristic of intelligent systems is the ability to make action decisions based on the anticipated outcomes. Video prediction systems have been demonstrated as a solution for predicting how the future will unfold visually, and thus, many models have been proposed that are capable of predicting future frames based on a history of observed frames~(and sometimes robot actions). However, a comprehensive method for determining the fitness of different video prediction models at guiding the selection of actions is yet to be developed. Current metrics assess video prediction models based on human perception of frame quality. In contrast, we argue that if these systems are to be used to guide action, necessarily, the actions the robot performs should be encoded in the predicted frames. In this paper, we are proposing a new metric to compare different video prediction models based on this argument. More specifically, we propose an action inference system and quantitatively rank different models based on how well we can infer the robot actions from the predicted frames. Our extensive experiments show that models with high perceptual scores can perform poorly in the proposed action inference tests and thus, may not be suitable options to be used in robot planning systems.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Computer Vision
π
π
Old Age
π
π
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
π
π
Old Age
SSD: Single Shot MultiBox Detector
π
π
Old Age
Squeeze-and-Excitation Networks
π
π
Old Age
Fast R-CNN
π
π
Old Age
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted