11
Moving LLMs into Production
Introduction
As the power we unlock from large language models grows, so, too, does the necessity of deploying these models to production so we can share our hard work with more people. This chapter explores different strategies for considering deployments of both closed-source and open-source LLMs, with an emphasis on best practices for model management, preparation for inference, and methods for improving efficiency such as quantization, pruning, and distillation.
Deploying Closed-Source LLMs to Production
For closed-source LLMs, the deployment process typically involves interacting with an API provided by the company that developed the model. This model-as-a-service approach is convenient because the underlying ...
Get Quick Start Guide to Large Language Models: Strategies and Best Practices for ChatGPT, Embeddings, Fine-Tuning, and Multimodal AI, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.