๐
๐
Old Age
HVM-1: Large-scale video models pretrained with nearly 5000 hours of human-like video data
July 25, 2024 ยท Entered Twilight ยท ๐ arXiv.org
Repo contents: .gitignore, LICENSE, README.md, atts, comps, helpers, models_mae.py, models_vit.py, requirements.txt, test_image_recognition.py, test_video_recognition.py, utils.py, vids, visualize_attention.py, visualize_completion.py
Authors
A. Emin Orhan
arXiv ID
2407.18067
Category
cs.CV: Computer Vision
Cross-listed
cs.LG,
cs.NE,
q-bio.NC
Citations
2
Venue
arXiv.org
Repository
https://github.com/eminorhan/hvm-1
โญ 6
Last Checked
3 months ago
Abstract
We introduce Human-like Video Models (HVM-1), large-scale video models pretrained with nearly 5000 hours of curated human-like video data (mostly egocentric, temporally extended, continuous video recordings), using the spatiotemporal masked autoencoder (ST-MAE) algorithm. We release two 633M parameter models trained at spatial resolutions of 224x224 and 448x448 pixels. We evaluate the performance of these models in downstream few-shot video and image recognition tasks and compare them against a model pretrained with 1330 hours of short action-oriented video clips from YouTube (Kinetics-700). HVM-1 models perform competitively against the Kinetics-700 pretrained model in downstream evaluations despite substantial qualitative differences between the spatiotemporal characteristics of the corresponding pretraining datasets. HVM-1 models also learn more accurate and more robust object representations compared to models pretrained with the image-based MAE algorithm on the same data, demonstrating the potential benefits of learning to predict temporal regularities in natural videos for learning better object representations.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
๐
๐
Old Age
Fast R-CNN
๐
๐
Old Age