A Study on Unsupervised Domain Adaptation for Semantic Segmentation in the Era of Vision-Language Models
November 25, 2024 Β· Declared Dead Β· π BMVC Workshops
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Manuel Schwonberg, Claus Werner, Hanno Gottschalk, Carsten Meyer
arXiv ID
2411.16407
Category
cs.CV: Computer Vision
Cross-listed
cs.AI
Citations
0
Venue
BMVC Workshops
Last Checked
4 months ago
Abstract
Despite the recent progress in deep learning based computer vision, domain shifts are still one of the major challenges. Semantic segmentation for autonomous driving faces a wide range of domain shifts, e.g. caused by changing weather conditions, new geolocations and the frequent use of synthetic data in model training. Unsupervised domain adaptation (UDA) methods have emerged which adapt a model to a new target domain by only using unlabeled data of that domain. The variety of UDA methods is large but all of them use ImageNet pre-trained models. Recently, vision-language models have demonstrated strong generalization capabilities which may facilitate domain adaptation. We show that simply replacing the encoder of existing UDA methods like DACS by a vision-language pre-trained encoder can result in significant performance improvements of up to 10.0% mIoU on the GTA5-to-Cityscapes domain shift. For the generalization performance to unseen domains, the newly employed vision-language pre-trained encoder provides a gain of up to 13.7% mIoU across three unseen datasets. However, we find that not all UDA methods can be easily paired with the new encoder and that the UDA performance does not always likewise transfer into generalization performance. Finally, we perform our experiments on an adverse weather condition domain shift to further verify our findings on a pure real-to-real domain shift.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Computer Vision
π
π
Old Age
π
π
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
π
π
Old Age
SSD: Single Shot MultiBox Detector
π
π
Old Age
Squeeze-and-Excitation Networks
π
π
Old Age
Fast R-CNN
π
π
Old Age
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted