9

Optimization Techniques for Performance

Optimization is the heart of this chapter, where you will be introduced to advanced techniques that improve the performance of LLMs without sacrificing efficiency. We will explore advanced techniques, including quantization and pruning, along with approaches for knowledge distillation. A targeted case study on mobile deployment will offer practical perspectives on how to effectively apply these methods.

In this chapter, we’re going to cover the following main topics:

  • Quantization – doing more with less
  • Pruning – trimming the fat from LLMs
  • Knowledge distillation – transferring wisdom efficiently
  • Case study – optimizing an LLM for mobile deployment

Upon completing this chapter, you will have acquired ...

Get Decoding Large Language Models now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.