Chapter 8. Model Serving and Integration

Machine learning in practice is largely focused on the training of machine learning models. Many theory books, however, do not address issues surrounding how to integrate a model into a production application and manage the life cycle of the model. Kubeflow as a platform covers all phases of model development and deployment.

In this chapter we build up your understanding of machine learning operational concepts, and then show how these concepts are executed with KFServing on Kubernetes in practice. Let’s start out learning about the core concepts in model management.

Basic Concepts of Model Management

You need to understand the following core concepts related to machine learning operations:

Model training versus model inference
Inference latencies
The high-level components of operationalizing a model in production
Batch versus transactional operation latencies
Transforming raw data into vectors
Hard-wiring a single model versus model management
Knowing when to retrain a model
Model rollbacks
Security models for model management
Scaling a model in production
Monitoring performance of a model
Model explainability
Detecting input outliers

The challenge is compounded when the machine learning operations practitioner has to execute the preceding concepts in the context of the following technologies:

Model serialization
Model servers
Protocol standards, HTTP/GRPC
Dealing with multiple machine learning frameworks
Containerization
GitOps

Get Kubeflow Operations Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Kubeflow Operations Guide by Josh Patterson, Michael Katzenellenbogen, Austin Harris

Chapter 8. Model Serving and Integration

Basic Concepts of Model Management

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly