Improving information retention in large scale online continual learning

October 12, 2022 · Declared Dead · 🏛 arXiv.org

⏳ CAUSE OF DEATH: Coming Soon™
Promised but never delivered

"Paper promises code 'coming soon'"

Evidence collected by the PWNC Scanner

Authors Zhipeng Cai, Vladlen Koltun, Ozan Sener arXiv ID 2210.06401 Category cs.CV: Computer Vision Citations 1 Venue arXiv.org Last Checked 1 month ago
Abstract
Given a stream of data sampled from non-stationary distributions, online continual learning (OCL) aims to adapt efficiently to new data while retaining existing knowledge. The typical approach to address information retention (the ability to retain previous knowledge) is keeping a replay buffer of a fixed size and computing gradients using a mixture of new data and the replay buffer. Surprisingly, the recent work (Cai et al., 2021) suggests that information retention remains a problem in large scale OCL even when the replay buffer is unlimited, i.e., the gradients are computed using all past data. This paper focuses on this peculiarity to understand and address information retention. To pinpoint the source of this problem, we theoretically show that, given limited computation budgets at each time step, even without strict storage limit, naively applying SGD with constant or constantly decreasing learning rates fails to optimize information retention in the long term. We propose using a moving average family of methods to improve optimization for non-stationary objectives. Specifically, we design an adaptive moving average (AMA) optimizer and a moving-average-based learning rate schedule (MALR). We demonstrate the effectiveness of AMA+MALR on large-scale benchmarks, including Continual Localization (CLOC), Google Landmarks, and ImageNet. Code will be released upon publication.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Computer Vision

Died the same way — ⏳ Coming Soon™