Preface

Generative AI is a revolutionary technology that has rapidly transitioned from lab demos to real-world applications, impacting billions. It can create new content—images, text, audio, videos, and more—by learning patterns from existing data, thereby enhancing creativity, augmenting data, or assisting in many tasks. For instance, a generative AI model trained on music can compose new melodies, while one trained on text can generate stories or even programming code.

This book isn’t just for experts—it’s for anyone who wants to learn about this fascinating new field. We won’t focus on building models from scratch or diving straight into complicated mathematics. Instead, we’ll leverage existing models to solve real-world problems, helping you to build a solid intuition around how these techniques work and providing the foundation for you to keep exploring.

This hands-on approach, we hope, will help you get up and running quickly and efficiently with generative AI. You’ll learn how to use pretrained models, adapt them for your needs, and generate new data with them. You’ll also learn how to evaluate the quality of generated data and explore ethical and social issues that may arise from using generative AI. This exposure will allow you to stay up-to-date with new models and help you identify areas that you may want to explore more deeply.

Who Should Read This Book

Given the impressive products and news you might have seen about generative AI, it’s normal to be excited, or worried, about it! Whether you’re curious about how programs can generate images, want to train a model to tweet in your style, or are looking to gain a deeper understanding of products like ChatGPT, this book is for you. With generative AI, we can do all of that and many other things, including these:

Write summaries of news articles
Generate images based on a description
Enhance the quality of an image
Transcribe meetings
Generate synthetic speech in your voice style
Incorporate new subjects or styles into image-generation models, like creating images of “your cat dressed as an astronaut”

No matter your reason, you’ve decided to learn about generative AI, and this book will guide you through it.

Prerequisites

This book assumes that you are comfortable programming in Python and have a foundational understanding of what machine learning is, including basic usage of frameworks like PyTorch or TensorFlow. Having practical experience with training models is not required, but it will be helpful to understand the content with more depth. The following resources provide a good foundation for the topics covered in this book:

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd ed., by Aurélien Géron (O’Reilly)
Deep Learning for Coders with fastai and PyTorch by Jeremy Howard and Sylvain Gugger (O’Reilly)

If you feel intimidated by the prerequisites, don’t worry! The book is designed to enhance your intuition and provide a hands-on approach to help you get started.

What You Will Learn

This book is divided into three parts:

In Part I, “Leveraging Open Models”, we’ll introduce the fundamental building blocks of generative AI. You’ll learn how to use pretrained models to generate text and images. This part will help you understand the basics of the field and understand the big picture.
Part II, “Transfer Learning for Generative Models”, is all about fine-tuning, showcasing ways to take existing models and adapt them to your needs. We’ll walk you through how to teach a diffusion model a new concept, customize a transformer model to classify text and reply in conversations, and explore advanced techniques for working with large models on limited hardware. Don’t worry if this is the first time you read about transformer or diffusion models; you’ll learn about them soon.
In Part III, “Going Further”, we’ll extend the ideas from the previous parts, generating new modalities such as audio and getting creative with new applications. After you’ve read this book, you’ll have a solid understanding of the methods and techniques on which generative applications are built.

How to Read This Book

We designed the book to be read in order, but we have kept the chapters as self-contained as possible so that you can jump around to the parts that interest you most. Many of the ideas covered in this book apply to multiple modalities, so even if you are interested in only one particular domain (such as image generation), you may still find it valuable to skim through the other chapters.

We’ve included exercises and code examples throughout the book, designed to help you get hands-on with the material. Try to complete these exercises as you go along, and where possible, see if you can adapt the examples to your use cases. Trying things out for yourself will help you build a much deeper understanding of the material.

Finally, most chapter summaries list additional resources for further reading. We encourage you to explore these resources to deepen your understanding of the topics covered in the book. You don’t need to read these resources before you progress to a new chapter; you can come back later, whenever you are ready to go deeper into the subjects that interest you.

Software and Hardware Requirements

To get the most out of this book, we highly recommend running the code examples as you read along. Experimenting with the code by making changes and exploring different scenarios will enhance your understanding. Working with transformers and diffusion models can be computationally intensive, so having access to a computer with an NVIDIA GPU is beneficial. While a GPU is not mandatory, it will significantly speed up training times.

You can use any of multiple online options, such as Google Colaboratory and Kaggle Notebooks. Follow these instructions to set up your environment and follow along:

Using Google Colab: Most code should work on any Google Colab instance. We recommend you use GPU runtimes for chapters with training loops.
Running code locally: To run the code on your computer, create a Python 3.10 virtual environment using your preferred method. As an example, you can do it with conda like this:

conda create -n genaibook python=3.10
conda activate genaibook

For optimal performance, we recommend using a CUDA-compatible GPU.¹ If you don’t know what CUDA is, don’t worry, we’ll explain it in the book.

Many support utilities and helper functions are used throughout the book. To access them, please install the genaibook package:

pip install genaibook

This will, in turn, install the libraries required to run transformers and diffusion models, along with PyTorch, matplotlib, numpy, and other essentials.

All code examples and supplementary material can be found in the book’s GitHub repository. You can run all the examples interactively in Jupyter Notebooks, and the repository will be regularly updated with the latest resources.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic: Indicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width: Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.

Tip

This element signifies a tip or suggestion.

Note

This element signifies a general note.

Warning

This element indicates a warning or caution.

Using Code Examples

Supplemental material (code examples, exercises, etc.) is available for download at https://oreil.ly/handsonGenAIcode.

If you have a technical question or a problem using the code examples, please send email to support@oreilly.com.

This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Hands-On Generative AI with Transformers and Diffusion Models by Omar Sanseviero, Pedro Cuenca, Apolinário Passos, and Jonathan Whitaker (O’Reilly). Copyright 2025 Omar Sanseviero, Pedro Cuenca, Apolinário Passos, and Jonathan Whitaker, 978-1-098-14924-6.”

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-889-8969 (in the United States or Canada)
707-827-7019 (international or local)
707-829-0104 (fax)
support@oreilly.com
https://oreilly.com/about/contact.html

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/handsonGenAI.

For news and information about our books and courses, visit https://oreilly.com.

Find us on LinkedIn: https://linkedin.com/company/oreilly-media.

Watch us on YouTube: https://youtube.com/oreillymedia.

State of the Art: A Moving Target

State of the art (SOTA) is used to describe the highest level of performance currently achieved in a particular task or domain. In the field of generative AI, the SOTA is constantly changing as new models are developed and new techniques are discovered. This book will provide you with a solid grounding in the fundamentals of generative AI, but by the time you read it, new models will have been released that outperform the ones we discuss here.

Rather than trying to chase the ever-shifting best, we’ve tried to focus on general principles that will help you understand how the models work in a way that will be useful even as the field continues to evolve. New models rarely come out of nowhere and often build on the ideas of previous models. By understanding the fundamentals, you’ll be better equipped to understand the latest developments as they happen.

Acknowledgments

We would like to express our deepest gratitude to the incredible O’Reilly team, particularly Jill Leonard, for her amazing guidance and support throughout this entire process. Special thanks to Nicole Butterfield, Karen Montgomery, Kate Dullea, Gregory Hyman, and Kristen Brown for their invaluable advice and contributions, from initial scoping to the creation of the beautiful cover and illustrations.

We are deeply grateful to our technical reviewers: Vishwesh Ravi Shrimali, David Mertz, Lipi Deepaakshi Patnaik, Luba Elliott, Anil Sood, Sai Vuppalapati, Ranjeeta Bhattacharya, Rajat Dubey, Bryan Bischof, Vladislav Bilay, Gourav Singh Bais, Aditya Goel, Lakshmanan Sethu Sankaranarayanan, Zygmunt Lenyk, Youssef Hosn, Vicki Reyzelman, Lewis Tunstall, Sayak Paul, and Vaibhav Srivastav. Their insightful feedback was instrumental in shaping this book.

We would also like to extend our gratitude to the Hugging Face team for their inspiration and collaboration, particularly Clémentine Fourrier for her insights on model evaluation, Sanchit Gandhi for his guidance on audio-related topics, and Leandro von Werra and Lewis Tunstall for helping us navigate the book-writing process. The Hugging Face team continues to inspire us with its brilliance and kindness, helping bring this project to life.

A heartfelt thank you to the countless friends, collaborators, and contributors who have shaped the open-source ecosystem that we are proud to be part of. We are grateful to the entire ML community for advancing the research, tools, and resources that form the heart of this book. This work was crafted in Jupyter Notebooks, and we owe special thanks to Jeremy Howard, Hamel Husain, and all the contributors to Quarto and nbdev for making this possible.

Jonathan

I am very grateful to the community of researchers and hackers sharing their ideas and pushing forward what is possible. To Jeremy Howard, Tanishq Abraham, and the rest of the fastdiffusion crew who came together to learn all we could about these ideas. And to my amazing coauthors, without whom this book could not have happened!

Apolinário

I am grateful to my coauthors Omar, Pedro and Jonathan for the co-creation of this book. Combining technology education and creativity has been a fun challenge to tackle. I thank my friends who understand and support me even when I come along to hang out carrying my laptop around and my Hugging Face colleagues for always being supportive.

Pedro

Writing a book is a lot of fun, but it unfairly exacts sacrifices from the people you love. I’m super lucky to have had the support of María José, my partner in life. She made it easy for me to work on it, and when I was stuck she helped with common sense reasoning that, frankly, is anything but common. I apologize to my Mom and Dad for always bringing my laptop when I visit, to my son Pablo for not exploring Hyrule or Eorzea as much as we’d have liked, and to my son Javier for sometimes talking too much about work and too little about life. They are the best.

I’m truly inspired by my amazing coauthors. I admire and look up to them and can’t believe how lucky I am to learn from them, every day. This extends to the Hugging Face folks, whose enthusiasm and humility provide a primordial soup where things happen, and to the open ML community at large, whose work is always advancing the field but not always getting the credit it deserves.

Thank you.

Omar

Thank you, Michelle, for your constant encouragement throughout this process, for all the brainstorming sessions, and for your support over the past two years. I couldn’t have completed this project without you. Hikes are back on the table!

To my parents, Ana and Walter, thank you for nurturing my love for books from the very beginning and for supporting me to become the person I am today.

Lastly, I want to thank my amazing coauthors—Pedro, Poli, and Jonathan. This journey has been truly fun, and I’m so grateful that we accomplished this together.

¹ Rather than GPU, you can also use the MPS device, which might work on Macs with Apple Silicon, but we have not tested this configuration extensively.

Get Hands-On Generative AI with Transformers and Diffusion Models now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Hands-On Generative AI with Transformers and Diffusion Models by Omar Sanseviero, Pedro Cuenca, Apolinário Passos, Jonathan Whitaker