Fangchen FENG

I am looking for a research collaborator with a strong background in deep learning, computer vision, and ideally, experience with generative models or image compression. This is a chance to work on a novel problem at the intersection of generative AI and data efficiency.

Academic Position of the inviting researcher Associate Professeur
Web page/ORCID 0000-0002-3158-3443
University of the inviting researcher

UNIVERSITE PARIS 13


Department/Lab/Unit L2TI
Research lines/projects

Image Generation with a File Size Budget: Today's powerful AI image generators (like Stable Diffusion or Midjourney) can create incredible images from text prompts. However, they are completely unaware of the final file size of the images they produce. The standard process is to first generate a high-quality image and then compress it using a tool like JPEG. This is inefficient. An AI-generated image might contain subtle textures or noise patterns that are visually insignificant but make the file size unnecessarily large. We propose a new approach where the generative model is made aware of a file size constraint during the image creation process itself. Instead of just trying to make a beautiful image that matches a prompt, our model will be trained to optimize for three things simultaneously: image quality, faithfulness to the prompt, and a low compressed file size. We will achieve this by adding a "rate penalty" to the model's training objective, which will discourage it from creating images that are difficult to compress. This research will first focus on developing a novel "differentiable rate estimator," a small neural network that predicts the final compressed file size of an image. We will then integrate this estimator directly into the training loop of a state-of-the-art diffusion model, using a composite loss function that simultaneously optimizes for image realism and a low bitrate. The next phase will involve modifying the model's architecture to accept a target file size as a direct user input, much like a text prompt, enabling explicit control over the generation's "data budget." Finally, we will rigorously benchmark our system against the standard "generate-then-compress" workflow, aiming to quantitatively demonstrate that our model produces higher-quality images at identical file sizes.

Strenghts of the offer/Expected benefits

This research would have a significant impact on any application that uses generative AI at scale, leading to faster-loading websites, more efficient assets for games and virtual worlds, and better performance on mobile devices.

Preferred duration

3 months

Additional information

Excellent English skill is required