Chapter 2. Understanding Foundation Models
To build applications with foundation models, you first need foundation models. While you don’t need to know how to develop a model to use it, a high-level understanding will help you decide what model to use and how to adapt it to your needs.
Training a foundation model is an incredibly complex and costly process. Those who know how to do this well are likely prevented by confidentiality agreements from disclosing the secret sauce. This chapter won’t be able to tell you how to build a model to compete with ChatGPT. Instead, I’ll focus on design decisions with consequential impact on downstream applications.
With the growing lack of transparency in the training process of foundation models, it’s difficult to know all the design decisions that go into making a model. In general, however, differences in foundation models can be traced back to decisions about training data, model architecture and size, and how they are post-trained to align with human preferences.
Since models learn from data, their training data reveals a great deal about their capabilities and limitations. This chapter begins with how model developers curate training data, focusing on the distribution of training data. Chapter 8 explores dataset engineering techniques in detail, including data quality evaluation and data synthesis.
Given the dominance of the transformer architecture, it might seem that model architecture is less of a choice. You might be wondering, what ...
Get AI Engineering now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.