Building Generative AI Services with FastAPI

Book description

Ready to build applications using generative AI? This practical book outlines the process necessary to design and build production grade AI services with a FastAPI web server that communicate seamlessly with databases, payment systems, and external APIs. You'll learn how to develop autonomous generative AI agents that stream outputs in real-time and interact with other models. Web developers, data scientists, and DevOps engineers will learn to implement end-to-end production-ready services that leverage generative AI.

You'll learn design patterns to manage software complexity, implement FastAPI lifespan for AI model integration, handle long-running generative tasks, perform content filtering, cache outputs, implement retrieval augmented generation (RAG) with a vector database, implement usage/cost monitoring and tracking, protect services with your own authentication and authorization mechanisms, and effectively control stream outputs directly from GenAI models. You'll explore efficient testing methods for AI outputs, validation against databases, and deployment patterns using Docker for robust microservices in the cloud.

  • Build generative services that interact with databases, external APIs, and more
  • Learn how to load AI models into a FastAPI lifecycle memory
  • Monitor and log model requests and responses within services
  • Use authentication and authorization patterns hooked with generative models
  • Handle and cache long-running inference tasks
  • Stream model outputs via streaming events and WebSockets into browsers or files
  • Automate the retraining process of generative models by exposing event-driven endpoints

Ali Parandeh is a Chartered Engineer with the UK Engineering Council and a Microsoft and Google certified developer, data engineer, and data scientist.

Publisher resources

View/Submit Errata

Table of contents

  1. Brief Table of Contents (Not Yet Final)
  2. 1. Introduction
    1. Why Generative AI Services Will Power Future Applications
      1. Facilitating the Creative Process
      2. Suggesting Contextually Relevant Solutions
      3. Personalizing the User Experience
      4. Minimizing Delay in Resolving Customer Queries
      5. Acting as an Interface to Complex Systems
      6. Scaling and Democratizing Content Generation
    2. What Prevents the Adoption of Generative AI Services
    3. Making Generative Services Autonomous
    4. Why Build Generative AI Services with FastAPI
    5. Overview of the Capstone Project
    6. Summary
  3. 2. Getting Started with FastAPI
    1. Introduction to FastAPI
      1. FastAPI Features and Advantages
      2. FastAPI Limitations
      3. Comparing FastAPI to Other Web Frameworks
    2. Setting Up Your Development Environment
      1. Installing Python, FastAPI and Required Packages
      2. Setting Up Tooling with IDEs
      3. Creating a Simple FastAPI Web Server
    3. Building Larger FastAPI Applications
      1. FastAPI Project Structures
      2. Progressive Re-Organization of Your FastAPI Project
    4. Onion / Layered
    5. Migrating to FastAPI
      1. Migrating from Django
      2. Migrating from Flask
      3. Migrating from Other Web Frameworks
    6. Summary
  4. 3. AI Integration and Model Serving
    1. Serving Generative Models
      1. Language Models
      2. Audio Models
      3. Vision Models
      4. Video Models
      5. 3D Models
    2. Strategies for serving generative AI models
      1. Model swapping on every request
      2. Using FastAPI application Lifespan to preload models
      3. Serving Models Externally
    3. The role of middlewares in service monitoring
    4. Summary
    5. References
  5. 4. Implementing Type Safe AI Services
    1. Introduction to Type Safety
      1. Why do people prefer to skip type-safety?
    2. Implementing Type Safety
      1. Type Annotations
      2. Using Annotated
      3. Dataclasses
    3. Pydantic Models
      1. How to use Pydantic
      2. Compound Pydantic Models
      3. Field Constraints and Validators
      4. Custom Field and Model Validators
      5. Computed Fields
      6. Model Export and Serialization
      7. Parsing environment variables with Pydantic
      8. Dataclasses or Pydantic models in FastAPI
    4. Summary
  6. 5. Achieving Concurrency in AI Workloads
    1. Optimizing GenAI services for multiple users
    2. Optimizing for I/O Tasks with Asynchronous Programming
      1. Synchronous vs. Asynchronous (Async) Execution
      2. Async Programming with model provider APIs
      3. Event Loop and Thread Pool in FastAPI
      4. Blocking the main server
      5. Project: Web Page Scraper
      6. Project: Retrieval Augmented Generation
    3. Optimizing Model Serving for Memory and Compute-Bound AI Inference Tasks
      1. Externalizing Model Serving
    4. Managing long-running AI inference tasks
    5. Conclusion
    6. References
  7. About the Author

Product information

  • Title: Building Generative AI Services with FastAPI
  • Author(s): Ali Parandeh
  • Release date: March 2025
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098160302