Chapter 8. Model Serving and Integration

Machine learning in practice is largely focused on the training of machine learning models. Many theory books, however, do not address issues surrounding how to integrate a model into a production application and manage the life cycle of the model. Kubeflow as a platform covers all phases of model development and deployment.

In this chapter we build up your understanding of machine learning operational concepts, and then show how these concepts are executed with KFServing on Kubernetes in practice. Let’s start out learning about the core concepts in model management.

Basic Concepts of Model Management

You need to understand the following core concepts related to machine learning operations:

  • Model training versus model inference
  • Inference latencies
  • The high-level components of operationalizing a model in production
  • Batch versus transactional operation latencies
  • Transforming raw data into vectors
  • Hard-wiring a single model versus model management
  • Knowing when to retrain a model
  • Model rollbacks
  • Security models for model management
  • Scaling a model in production
  • Monitoring performance of a model
  • Model explainability
  • Detecting input outliers

The challenge is compounded when the machine learning operations practitioner has to execute the preceding concepts in the context of the following technologies:

  • Model serialization
  • Model servers
  • Protocol standards, HTTP/GRPC
  • Dealing with multiple machine learning frameworks
  • Containerization
  • GitOps

Get Kubeflow Operations Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.