TimelyFL: Heterogeneity-aware Asynchronous Federated Learning with Adaptive Partial Training

April 14, 2023 · Declared Dead · 🏛 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Tuo Zhang, Lei Gao, Sunwoo Lee, Mi Zhang, Salman Avestimehr arXiv ID 2304.06947 Category cs.LG: Machine Learning Cross-listed cs.DC Citations 52 Venue 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Last Checked 4 months ago

Abstract

In cross-device Federated Learning (FL) environments, scaling synchronous FL methods is challenging as stragglers hinder the training process. Moreover, the availability of each client to join the training is highly variable over time due to system heterogeneities and intermittent connectivity. Recent asynchronous FL methods (e.g., FedBuff) have been proposed to overcome these issues by allowing slower users to continue their work on local training based on stale models and to contribute to aggregation when ready. However, we show empirically that this method can lead to a substantial drop in training accuracy as well as a slower convergence rate. The primary reason is that fast-speed devices contribute to many more rounds of aggregation while others join more intermittently or not at all, and with stale model updates. To overcome this barrier, we propose TimelyFL, a heterogeneity-aware asynchronous FL framework with adaptive partial training. During the training, TimelyFL adjusts the local training workload based on the real-time resource capabilities of each client, aiming to allow more available clients to join in the global update without staleness. We demonstrate the performance benefits of TimelyFL by conducting extensive experiments on various datasets (e.g., CIFAR-10, Google Speech, and Reddit) and models (e.g., ResNet20, VGG11, and ALBERT). In comparison with the state-of-the-art (i.e., FedBuff), our evaluations reveal that TimelyFL improves participation rate by 21.13%, harvests 1.28x - 2.89x more efficiency on convergence rate, and provides a 6.25% increment on test accuracy.