Designing Large Language Model Applications

Book description

Transformer-based language models are powerful tools for solving a variety of language tasks and represent a phase shift in the field of natural language processing. But the transition from demos and prototypes to full-fledged applications has been slow. With this book, you'll learn the tools, techniques, and playbooks for building useful products that incorporate the power of language models.

Experienced ML researcher Suhas Pai provides practical advice on dealing with commonly observed failure modes and counteracting the current limitations of state-of-the-art models. You'll take a comprehensive deep dive into the Transformer architecture and its variants. And you'll get up-to-date with the taxonomy of language models, which can offer insight into which models are better at which tasks.

You'll learn:

  • Clever ways to deal with failure modes of current state-of-the-art language models, and methods to exploit their strengths for building useful products
  • How to develop an intuition about the Transformer architecture and the impact of each architectural decision
  • Ways to adapt pretrained language models to your own domain and use cases
  • How to select a language model for your domain and task from among the choices available, and how to deal with the build-versus-buy conundrum
  • Effective fine-tuning and parameter efficient fine-tuning, and few-shot and zero-shot learning techniques
  • How to interface language models with external tools and integrate them into an existing software ecosystem

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Conventions Used in This Book
    2. Using Code Examples
    3. O’Reilly Online Learning
    4. How to Contact Us
    5. Acknowledgments
  2. I. LLM Ingredients
  3. 1. Introduction
    1. Defining LLMs
    2. A Brief History of LLMs
      1. Early years
      2. The modern LLM era
    3. The impact of LLMs
    4. LLM usage in the enterprise
    5. Prompting
      1. Zero-shot prompting
      2. Few-shot prompting
      3. Chain-of-Thought prompting
      4. Prompt chaining
      5. Adversarial Prompting
    6. Accessing LLMs through an API
    7. Strengths and limitations of LLMs
    8. Building your first chatbot prototype
    9. From prototype to production
    10. Summary
  4. 2. Pre-training Data
    1. Ingredients of an LLM
    2. Pre-training data requirements
    3. Popular pre-training datasets
    4. Training Data Preprocessing
      1. Data filtering and cleaning
      2. Selecting Quality Documents
      3. Deduplication
      4. Removing PII (Personally Identifiable Information)
      5. Training Set Decontamination
      6. Data Mixtures
    5. Effect of pre-training data on downstream tasks
    6. Bias and Fairness Issues in Pre-training Datasets
    7. Summary
  5. 3. Vocabulary and Tokenization
    1. Vocabulary and Tokenization
      1. Tokenizer
      2. Tokenization Pipeline
      3. Special Tokens
    2. Summary
  6. 4. Architectures and Learning Objectives
    1. Preliminaries
    2. Representing Meaning
    3. The Transformer Architecture
      1. Self-attention
      2. Positional Encoding
      3. Feed-forward networks
      4. Loss functions
    4. Intrinsic Model Evaluation
    5. Transformer backbones
      1. Encoder-only architectures
      2. Encoder-Decoder Architectures
      3. Decoder-only Architectures
      4. Mixture of Experts
    6. Learning Objectives
      1. Full Language Modeling
      2. Prefix Language Modeling
      3. Masked Language Modeling
      4. Which learning objectives are better?
    7. Pre-training models
    8. Summary
  7. II. Utilizing LLMs
  8. 5. Adapting LLMs To Your Use Case
    1. Navigating the LLM Landscape
      1. Who are the LLM providers?
      2. Model flavors
      3. Open-source LLMs
    2. How to choose an LLM for your task
      1. Open-source vs. Proprietary LLMs
      2. LLM Evaluation
    3. Loading LLMs
      1. HuggingFace Accelerate
      2. Ollama
      3. LLM Inference APIs
    4. Decoding strategies
      1. Greedy decoding
      2. Beam Search
      3. Top-K sampling
      4. Top-P sampling
    5. Running inference on LLMs
      1. Structured outputs
    6. Model debugging and interpretability
    7. Summary
  9. 6. Fine-Tuning
    1. The need for fine-tuning
    2. Fine-tuning: A full example
      1. Learning algorithms parameters
      2. Memory Optimization parameters
      3. Regularization parameters
      4. Noise embeddings
      5. Batch size
      6. Parameter Efficient Fine-tuning
      7. Working with reduced precision
      8. Putting it all together
    3. Fine-tuning Datasets
      1. Utilizing publicly available instruction-tuning datasets
      2. LLM-generated instruction-tuning datasets
    4. Summary
  10. 7. Advanced Fine-Tuning Techniques
    1. Continual Pre-training
      1. Replay (Memory)
      2. Parameter Expansion
    2. Parameter-Efficient Fine-tuning
      1. Adding new parameters
      2. Subset methods
    3. Combining Multiple Models
      1. Model Ensembling
      2. Model Fusion
      3. Adapter Merging
    4. Summary
  11. 8. Alignment Training and Reasoning
    1. Defining alignment training
    2. Reinforcement learning
      1. Types of human feedback
      2. RLHF example
    3. Hallucination
    4. Mitigating Hallucinations
      1. Self-consistency
      2. Chain-of-Actions
      3. Recitation
      4. Sampling methods for addressing hallucination
      5. DoLa (Decoding by cOntrasting LAyers)
    5. In-context hallucinations
    6. Hallucinations due to irrelevant information
    7. Reasoning
      1. Deductive reasoning
      2. Inductive reasoning
      3. Abductive reasoning
      4. Common Sense Reasoning
    8. Inducing reasoning in LLMs
      1. Verifiers for improving reasoning
      2. Inference-time computation
      3. Fine-tuning for reasoning
    9. Summary
  12. 9. Inference Optimization
    1. LLM Inference Challenges
    2. Inference Optimization Techniques
    3. Techniques for reducing compute
      1. K-V caching
      2. Early Exit
      3. Knowledge Distillation
    4. Techniques for accelerating decoding
      1. Speculative decoding
      2. Parallel decoding
    5. Techniques for reducing storage needs
      1. Symmetric quantization
      2. Asymmetric quantization
    6. Summary
  13. III. LLM Application Paradigms
  14. 10. Interfacing LLMs with External Tools
    1. LLM Interaction Paradigms
      1. The Passive Approach
      2. The Explicit Approach
      3. The Autonomous Approach
    2. Defining Agents
    3. Agentic Workflow
    4. Components of an agentic system
      1. Models
      2. Tools
      3. Data Stores
      4. Agent loop prompt
      5. Guardrails and Verifiers
      6. Safety Guardrails
      7. Verification modules
      8. Agent orchestration software
    5. Summary
  15. 11. Embeddings and Representation Learning
    1. Representation Learning
    2. Introduction to Embeddings
    3. Semantic Search
    4. Similarity Measures
    5. Fine-tuning Embedding Models
      1. Base models
      2. Training Dataset
    6. Instruction Embeddings
    7. Optimizing embedding size
      1. Binary and Integer Embeddings
      2. Product Quantization
    8. Chunking
      1. Vector Databases
    9. Interpreting Embeddings
    10. Summary
  16. 12. Retrieval-Augmented Generation (RAG)
    1. The need for RAG
    2. Typical RAG scenarios
    3. Deciding when to retrieve
    4. The RAG pipeline
      1. Rewrite
      2. Retrieve
      3. Generative Retrieval
      4. Rerank
      5. Refine
      6. Insert
      7. Generate
    5. RAG for memory management
    6. RAG for selecting in-context training examples
      1. LLM-R
    7. RAG for model training
      1. REALM
    8. Limitations of RAG
    9. RAG vs. Long Context
    10. RAG vs fine-tuning
    11. Summary
  17. 13. Design Patterns & System Architecture
    1. Multi-LLM architectures
      1. LLM Cascades
      2. Routers
      3. Task specialized LLMs
    2. Programming Paradigms
      1. DSPy
      2. LMQL
    3. Summary
  18. About the Author

Product information

  • Title: Designing Large Language Model Applications
  • Author(s): Suhas Pai
  • Release date: March 2025
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098150501