Chapter 12. Efficient Fine-Tuning of Large Models

As discussed in the preceding chapters of this book, the capacity of deep learning models is rapidly increasing. The scaling law of deep learning (discussed in Chapter 1) is still fuelling (over)parameterization, to the extent that human brain–scale models with hundreds of trillions of parameters have been built.1 The general industry trend is departing from the battle-tested approach of developing small, purpose-built models for specific tasks to rapidly adapting large, general-purpose models to the task at hand, through the use of fine-tuning and meta-learning techniques like the ones discussed in Chapter 11. While this newer approach, which you will read more about in Chapter 13, may be more economical in terms of development cost, its efficacy is still relatively untested.

This shift is welcome because of its potential to minimize development time and reduce the time to production. However, in line with the “no free lunch” theorem, it comes with its own challenges—for example, when dealing with limited hardware resources. This chapter focuses on the approach of adapting a larger model for a specific task and extends the discussion of fine tuning from the previous chapter, introducing two new techniques: Low-Rank Adaptation (LoRA), which allows you to efficiently fine tune large models on limited-capacity hardware, and its quantized version, QLoRA, which provides further memory efficiencies by using mixed precision. This discussion ...

Get Deep Learning at Scale now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.