Chapter 7. Introduction to Diffusion Models for Image Generation

This chapter introduces the most popular diffusion models for AI image generation. You’ll learn the benefits and limitations of each of the top models, so that you can be confident in choosing between them based on the task at hand.

Introduced in 2015, diffusion models are a class of generative models that have shown spectacular results for generating images from text. The release of DALL-E 2 in 2022 marked a great leap forward in the quality of generated images from diffusion models, with open source Stable Diffusion, and community favorite Midjourney quickly following to forge a competitive category. With the integration of DALL-E 3 into ChatGPT, the lines will continue to blur between text and image generation. However, advanced users will likely continue to require direct access to the underlying image generation model, to get the best results.

Diffusion models are trained by many steps of adding random noise to an image and then predicting how to reverse the diffusion process by denoising (removing noise). The approach comes from physics, where it has been used for simulating how particles diffuse (spread out) through a medium. The predictions are conditioned on the description of the image, so if the resulting image doesn’t match, the neural network weights of the model are adjusted to make it better at predicting the image from the description. When trained, the model is able to take random noise and turn ...

Get Prompt Engineering for Generative AI now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.