๐
๐
Old Age
Pre-training for Action Recognition with Automatically Generated Fractal Datasets
November 26, 2024 ยท Entered Twilight ยท ๐ International Journal of Computer Vision
Repo contents: .gitignore, LICENSE, README.md, assets, cfg, generate_synthetic.py, prepare_data.py, requirements.txt, src, train_ssl.py, train_sup.py, val_sup.py, video_examples.md
Authors
Davyd Svyezhentsev, George Retsinas, Petros Maragos
arXiv ID
2411.17584
Category
cs.CV: Computer Vision
Citations
2
Venue
International Journal of Computer Vision
Repository
https://github.com/davidsvy/fractal_video
โญ 3
Last Checked
3 months ago
Abstract
In recent years, interest in synthetic data has grown, particularly in the context of pre-training the image modality to support a range of computer vision tasks, including object classification, medical imaging etc. Previous work has demonstrated that synthetic samples, automatically produced by various generative processes, can replace real counterparts and yield strong visual representations. This approach resolves issues associated with real data such as collection and labeling costs, copyright and privacy. We extend this trend to the video domain applying it to the task of action recognition. Employing fractal geometry, we present methods to automatically produce large-scale datasets of short synthetic video clips, which can be utilized for pre-training neural models. The generated video clips are characterized by notable variety, stemmed by the innate ability of fractals to generate complex multi-scale structures. To narrow the domain gap, we further identify key properties of real videos and carefully emulate them during pre-training. Through thorough ablations, we determine the attributes that strengthen downstream results and offer general guidelines for pre-training with synthetic videos. The proposed approach is evaluated by fine-tuning pre-trained models on established action recognition datasets HMDB51 and UCF101 as well as four other video benchmarks related to group action recognition, fine-grained action recognition and dynamic scenes. Compared to standard Kinetics pre-training, our reported results come close and are even superior on a portion of downstream datasets. Code and samples of synthetic videos are available at https://github.com/davidsvy/fractal_video .
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
๐
๐
Old Age
Fast R-CNN
๐
๐
Old Age