๐
๐
Old Age
Class-Incremental Grouping Network for Continual Audio-Visual Learning
September 11, 2023 ยท Entered Twilight ยท ๐ IEEE International Conference on Computer Vision
Repo contents: .gitignore, LICENSE, README.md, assets, audio_io.py, datasets.py, grouping.py, metadata, model.py, requirements.txt, test.py, train.py, utils.py
Authors
Shentong Mo, Weiguo Pian, Yapeng Tian
arXiv ID
2309.05281
Category
cs.CV: Computer Vision
Cross-listed
cs.LG,
cs.MM
Citations
31
Venue
IEEE International Conference on Computer Vision
Repository
https://github.com/stoneMo/CIGN
โญ 17
Last Checked
2 months ago
Abstract
Continual learning is a challenging problem in which models need to be trained on non-stationary data across sequential tasks for class-incremental learning. While previous methods have focused on using either regularization or rehearsal-based frameworks to alleviate catastrophic forgetting in image classification, they are limited to a single modality and cannot learn compact class-aware cross-modal representations for continual audio-visual learning. To address this gap, we propose a novel class-incremental grouping network (CIGN) that can learn category-wise semantic features to achieve continual audio-visual learning. Our CIGN leverages learnable audio-visual class tokens and audio-visual grouping to continually aggregate class-aware features. Additionally, it utilizes class tokens distillation and continual grouping to prevent forgetting parameters learned from previous tasks, thereby improving the model's ability to capture discriminative audio-visual categories. We conduct extensive experiments on VGGSound-Instruments, VGGSound-100, and VGG-Sound Sources benchmarks. Our experimental results demonstrate that the CIGN achieves state-of-the-art audio-visual class-incremental learning performance. Code is available at https://github.com/stoneMo/CIGN.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted