๐
๐
Old Age
Breaking the Lock-in: Diversifying Text-to-Image Generation via Representation Modulation
June 05, 2026 ยท Grace Period ยท ๐ ICML 2026
Authors
Dahee Kwon, Haeun Lee, Jaesik Choi
arXiv ID
2606.06813
Category
cs.CV: Computer Vision
Cross-listed
cs.AI
Citations
0
Venue
ICML 2026
Abstract
Recent text-to-image models built on large-scale Transformer backbones and flow-based objectives deliver strong text-image alignment and high visual quality, yet often produce overly similar samples under a fixed prompt. Existing diversity-enhancement methods alleviate this issue, but typically require expensive sampling or auxiliary optimization, incurring non-trivial overhead. To investigate the root cause of this homogeneity, we examine intermediate Transformer features and observe that the zero-frequency spatial average (DC) component rapidly converges across seeds early in generation, causing early trajectory lock-in that limits downstream variation. Building on this observation, we propose DC Attenuation for diVersity Enhancement (DAVE), a training-free representation-level intervention that selectively attenuates this component in the early regime. DAVE preserves the sampling pipeline with negligible overhead, improving prompt-consistent diversity while maintaining competitive image quality.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
๐
๐
Old Age
Fast R-CNN
๐
๐
Old Age