Chapter 5. Fine-Tuning and Evaluation

In Chapter 4, you learned various techniques to help increase the performance of large generative models. You also explored efficient distributed computing strategies such as distributed data parallel (DDP) and fully sharded data parallel (FSDP) to scale your large-model development efforts across a set of distributed-compute instances. While these techniques are essential to pretraining large foundation models from scratch, they are also useful for adapting foundation models to your custom datasets and use cases during a process called fine-tuning.

In this chapter, you will dive deep into a fine-tuning technique called instruction fine-tuning. You already learned about instructions in Chapter 2 with the discussion on prompt engineering. Instructions are commands to the model to perform some task, such as “Summarize this conversation” or “Generate a personalized marketing email.” When fine-tuning a foundation model with instructions, it’s important to present a mix of instructions across many different tasks to maintain the foundation model’s ability to serve as a general-purpose generative model.

In this chapter, you will learn about various evaluation metrics and benchmarks to help measure the effectiveness of your instruction fine-tuning efforts across many tasks. It is recommended that you establish a set of baseline evaluation metrics and compare the generated model output both before and after fine-tuning. This feedback loop is critical ...

Get Generative AI on AWS now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.