Aggregate, Decompose, and Fine-Tune: A Simple Yet Effective Factor-Tuning Method for Vision Transformer

November 12, 2023 · Entered Twilight · 🏛 arXiv.org

Repo contents: Imagenet_loader.py, README.md, configs, execute.py, figures, methods, requirements.txt, run.sh, vtab.py

Authors Dongping Chen arXiv ID 2311.06749 Category cs.CV: Computer Vision Cross-listed cs.AI, cs.LG Citations 4 Venue arXiv.org Repository https://github.com/Dongping-Chen/EFFT-EFfective-Factor-Tuning ⭐ 8 Last Checked 3 months ago

Abstract

Recent advancements have illuminated the efficacy of some tensorization-decomposition Parameter-Efficient Fine-Tuning methods like LoRA and FacT in the context of Vision Transformers (ViT). However, these methods grapple with the challenges of inadequately addressing inner- and cross-layer redundancy. To tackle this issue, we introduce EFfective Factor-Tuning (EFFT), a simple yet effective fine-tuning method. Within the VTAB-1K dataset, our EFFT surpasses all baselines, attaining state-of-the-art performance with a categorical average of 75.9% in top-1 accuracy with only 0.28% of the parameters for full fine-tuning. Considering the simplicity and efficacy of EFFT, it holds the potential to serve as a foundational benchmark. The code and model are now available at https://github.com/Dongping-Chen/EFFT-EFfective-Factor-Tuning.