Chapter 6. Fine-Tuning Language Models

In Chapter 2, we explored how LMs work and how to use them for tasks such as text generation and sequence classification. We saw that LMs could be helpful in many tasks without further training, thanks to proper prompting and the zero-shot capabilities of these models. We also explored some of the hundreds of thousands of pretrained models by the community. In this chapter, we’ll discuss how we can improve the performance of LMs on specific tasks by fine-tuning them on our data.

While pretrained models showcase remarkable capabilities, their general-purpose training may not be suited for certain tasks or domains. Fine-tuning is frequently used to tailor the model’s understanding to the nuances of their dataset or task. For instance, in the field of medical research, an LM pretrained on general web text will not perform great out of the box, so we can fine-tune it on a dataset of medical literature to enhance its ability to generate relevant medical text or assist in information extraction from healthcare documents. Another example is for making conversational models. Although large pretrained models can generate coherent text, they usually don’t work well for generating high-quality conversational text or following instructions. We can fine-tune this model on a dataset with everyday conversations and informal language structures, adapting the model to output engaging, conversational text, as the one you would expect in interfaces such as ...

Get Hands-On Generative AI with Transformers and Diffusion Models now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Hands-On Generative AI with Transformers and Diffusion Models by Omar Sanseviero, Pedro Cuenca, Apolinário Passos, Jonathan Whitaker

Chapter 6. Fine-Tuning Language Models

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly