Chapter 7. Fine-Tuning Stable Diffusion

In the previous chapter, we introduced how fine-tuning can teach LMs to write in a particular style or to learn concepts for a specific domain. We can apply the same principles to text-to-image models, allowing us to customize the models even with access to a single GPU (versus the multi-GPU nodes required to pretrain a model like Stable Diffusion).

In this chapter, we will use the base pretrained Stable Diffusion model you learned in Chapter 5 and extend it to learn styles and concepts it might not know about, such as the concept of “your pet” or a particular painting style. We will also learn how to give it new capabilities, such as inpainting and giving new conditions as inputs.

Rather than writing code from scratch here, we will look into understanding and running existing scripts created for fine-tuning the models in this section. For that, we recommend you clone the diffusers library, as most examples will be in the examples folder of the library:

git clone https://github.com/huggingface/diffusers.git

Full Stable Diffusion Fine-Tuning

Full model is a qualifier to fine-tuning that emerged after the development of specific model customization techniques such as LoRA, Textual Inversion, and DreamBooth. Those techniques do not fully fine-tune the entire model, but rather either provide an efficient way for fine-tuning (as we learned with LoRAs for LLMs in Chapter 6) or provide novel ways to “teach” the model new concepts. We will discuss ...

Get Hands-On Generative AI with Transformers and Diffusion Models now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.