SMILE: Semantically-guided Multi-attribute Image and Layout Editing

October 05, 2020 · Declared Dead · 🏛 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Andrés Romero, Luc Van Gool, Radu Timofte arXiv ID 2010.02315 Category cs.CV: Computer Vision Citations 7 Venue 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Last Checked 4 months ago

Abstract

Attribute image manipulation has been a very active topic since the introduction of Generative Adversarial Networks (GANs). Exploring the disentangled attribute space within a transformation is a very challenging task due to the multiple and mutually-inclusive nature of the facial images, where different labels (eyeglasses, hats, hair, identity, etc.) can co-exist at the same time. Several works address this issue either by exploiting the modality of each domain/attribute using a conditional random vector noise, or extracting the modality from an exemplary image. However, existing methods cannot handle both random and reference transformations for multiple attributes, which limits the generality of the solutions. In this paper, we successfully exploit a multimodal representation that handles all attributes, be it guided by random noise or exemplar images, while only using the underlying domain information of the target domain. We present extensive qualitative and quantitative results for facial datasets and several different attributes that show the superiority of our method. Additionally, our method is capable of adding, removing or changing either fine-grained or coarse attributes by using an image as a reference or by exploring the style distribution space, and it can be easily extended to head-swapping and face-reenactment applications without being trained on videos.