Reducing Spatial Fitting Error in Distillation of Denoising Diffusion Models

November 07, 2023 · Declared Dead · 🏛 AAAI Conference on Artificial Intelligence

Authors Shengzhe Zhou, Zejian Lee, Shengyuan Zhang, Lefan Hou, Changyuan Yang, Guang Yang, Zhiyuan Yang, Lingyun Sun arXiv ID 2311.03830 Category cs.CV: Computer Vision Cross-listed cs.AI Citations 1 Venue AAAI Conference on Artificial Intelligence Repository https://github.com/Sainzerjj/SFERD} Last Checked 2 months ago

Abstract

Denoising Diffusion models have exhibited remarkable capabilities in image generation. However, generating high-quality samples requires a large number of iterations. Knowledge distillation for diffusion models is an effective method to address this limitation with a shortened sampling process but causes degraded generative quality. Based on our analysis with bias-variance decomposition and experimental observations, we attribute the degradation to the spatial fitting error occurring in the training of both the teacher and student model. Accordingly, we propose $\textbf{S}$patial $\textbf{F}$itting-$\textbf{E}$rror $\textbf{R}$eduction $\textbf{D}$istillation model ($\textbf{SFERD}$). SFERD utilizes attention guidance from the teacher model and a designed semantic gradient predictor to reduce the student's fitting error. Empirically, our proposed model facilitates high-quality sample generation in a few function evaluations. We achieve an FID of 5.31 on CIFAR-10 and 9.39 on ImageNet 64$\times$64 with only one step, outperforming existing diffusion methods. Our study provides a new perspective on diffusion distillation by highlighting the intrinsic denoising ability of models. Project link: \url{https://github.com/Sainzerjj/SFERD}.