9 Optimization Techniques for Performance

Optimization is the heart of this chapter, where you will be introduced to advanced techniques that improve the performance of LLMs without sacrificing efficiency. We will explore advanced techniques, including quantization and pruning, along with approaches for knowledge distillation. A targeted case study on mobile deployment will offer practical perspectives on how to effectively apply these methods.

In this chapter, we’re going to cover the following main topics:

Quantization – doing more with less
Pruning – trimming the fat from LLMs
Knowledge distillation – transferring wisdom efficiently
Case study – optimizing an LLM for mobile deployment

Upon completing this chapter, you will have acquired ...

Get Decoding Large Language Models now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Decoding Large Language Models by Irena Cronin

9

Optimization Techniques for Performance

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly