Chapter 11. Controlled Generation and Fine-Tuning with Stable Diffusion

Controlling generation is an active area of research with many cutting-edge techniques introduced only recently. The goal of these techniques is to augment diffusion models to better handle common image tasks such as edge detection and segmentation maps. These techniques provide fine-grained control over image generation.

In this chapter, you will learn about a powerful technique called ControlNet to augment and improve text-to-image generation for models like Stable Diffusion. Additionally, you will explore multimodal fine-tuning with tools like DreamBooth, algorithms such as textual inversion, and optimizations including parameter-efficient fine-tuning (PEFT). Lastly, you will revisit reinforcement learning from human feedback (RLHF) in the context of aligning multimodal models with human preferences, including helpfulness, honesty, and harmlessness (HHH).

ControlNet

Described in a 2023 paper,1 ControlNet is a popular way to train various controls that improve your image-based generative tasks. ControlNet is a deep neural network that works with diffusion models like Stable Diffusion.

During training, a control learns a specific task, such as edge-detection or depth-mapping, from a set of given inputs. A relatively small amount of data is required to train a very powerful control. You can train your own controls using ControlNet or choose from a large number of pretrained controls.

Let’s use Figure 11-1 ...

Get Generative AI on AWS now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.