๐
๐
Old Age
Dense Text-to-Image Generation with Attention Modulation
August 24, 2023 ยท Entered Twilight ยท ๐ IEEE International Conference on Computer Vision
Repo contents: LICENSE, NOTICE, README.md, dataset, eval_iou.ipynb, figures, gradio_app.py, inference.ipynb, requirements.txt, utils.py
Authors
Yunji Kim, Jiyoung Lee, Jin-Hwa Kim, Jung-Woo Ha, Jun-Yan Zhu
arXiv ID
2308.12964
Category
cs.CV: Computer Vision
Cross-listed
cs.GR,
cs.LG
Citations
187
Venue
IEEE International Conference on Computer Vision
Repository
https://github.com/naver-ai/DenseDiffusion
โญ 501
Last Checked
1 month ago
Abstract
Existing text-to-image diffusion models struggle to synthesize realistic images given dense captions, where each text prompt provides a detailed description for a specific image region. To address this, we propose DenseDiffusion, a training-free method that adapts a pre-trained text-to-image model to handle such dense captions while offering control over the scene layout. We first analyze the relationship between generated images' layouts and the pre-trained model's intermediate attention maps. Next, we develop an attention modulation method that guides objects to appear in specific regions according to layout guidance. Without requiring additional fine-tuning or datasets, we improve image generation performance given dense captions regarding both automatic and human evaluation scores. In addition, we achieve similar-quality visual results with models specifically trained with layout conditions.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted