๐ฎ
๐ฎ
The Ethereal
BLIP-Adapter: Parameter-Efficient Transfer Learning for Mobile Screenshot Captioning
September 26, 2023 ยท Entered Twilight ยท ๐ arXiv.org
Repo contents: .gitignore, .gitmodules, README.md, coco_caption, configs, environment.yml, eval.py, generator.py, loader.py, models, s2w_dataset.py, scorer.py, tfm.py, train.py
Authors
Ching-Yu Chiang, I-Hua Chang, Shih-Wei Liao
arXiv ID
2309.14774
Category
cs.LG: Machine Learning
Cross-listed
cs.CL,
cs.CV,
cs.HC
Citations
1
Venue
arXiv.org
Repository
https://github.com/RainYuGG/BLIP-Adapter
โญ 9
Last Checked
3 months ago
Abstract
This study aims to explore efficient tuning methods for the screenshot captioning task. Recently, image captioning has seen significant advancements, but research in captioning tasks for mobile screens remains relatively scarce. Current datasets and use cases describing user behaviors within product screenshots are notably limited. Consequently, we sought to fine-tune pre-existing models for the screenshot captioning task. However, fine-tuning large pre-trained models can be resource-intensive, requiring considerable time, computational power, and storage due to the vast number of parameters in image captioning models. To tackle this challenge, this study proposes a combination of adapter methods, which necessitates tuning only the additional modules on the model. These methods are originally designed for vision or language tasks, and our intention is to apply them to address similar challenges in screenshot captioning. By freezing the parameters of the image caption models and training only the weights associated with the methods, performance comparable to fine-tuning the entire model can be achieved, while significantly reducing the number of parameters. This study represents the first comprehensive investigation into the effectiveness of combining adapters within the context of the screenshot captioning task. Through our experiments and analyses, this study aims to provide valuable insights into the application of adapters in vision-language models and contribute to the development of efficient tuning techniques for the screenshot captioning task. Our study is available at https://github.com/RainYuGG/BLIP-Adapter
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
๐ฎ
๐ฎ
The Ethereal
Continuous control with deep reinforcement learning
๐
๐
Old Age
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
๐
๐
Old Age
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
๐
๐
Old Age
SGDR: Stochastic Gradient Descent with Warm Restarts
๐ฎ
๐ฎ
The Ethereal