Chapter 3. Moving to Chat

In the previous chapter, you learned about generative pre-trained transformer architecture. The way that these models are trained drastically influences their behavior. A base model, for example, has merely gone through the pre-training process—it has been trained on billions of arbitrary documents from the internet, and if you prompt a base model with the first half of a document, it will generate a plausible-sounding completion for that document. This behavior alone can be quite useful—and throughout this book, we will show how you can “trick” such a model into accomplishing all sorts of tasks besides pure document completion.

However, for a number of reasons, base models can be difficult to use in an application setting. For one thing, because it’s been trained on arbitrary documents from the internet, the base model is equally capable of mimicking both the light side and dark side of the internet. If you prompt it with “This is a recipe for Sicilian Lasagna:” then the LLM will generate the recipe for a delightful Italian dish. But if, on the other hand, you prompt it with “These are the detailed steps for making methamphetamines:” then you’ll soon have all you need to embark on a harrowing life of crime. Generally, we need models to be “safe” so that users won’t be surprised by off-putting conversations involving violence, sex, or profanity.

Another reason that base models are sometimes challenging to use in applications is that they can only complete ...

Get Prompt Engineering for LLMs now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.