Chapter 19. Deployment with TensorFlow Serving

Over the past few chapters you’ve been looking at deployment surfaces for your models—on Android and iOS, and in the web browser. Another obvious place where models can be deployed is to a server, so that your users can pass data to your server and have it run inference using your model and return the results. This can be achieved using TensorFlow Serving, a simple “wrapper” for models that provides an API surface as well as production-level scalability. In this chapter you’ll get an introduction to TensorFlow Serving and how you can use it to deploy and manage inference with a simple model.

What Is TensorFlow Serving?

This book has primarily focused on the code for creating models, and while this is a massive undertaking in itself, it’s only a small part of the overall picture of what it takes to use machine learning models in production. As you can see in Figure 19-1, your code needs to work alongside code for configuration, data collection, data verification, monitoring, machine resource management, and feature extraction, as well as analysis tools, process management tools, and serving infrastructure.

TensorFlow’s ecosystem for these tools is called TensorFlow Extended (TFX). Other than the serving infrastructure, covered in this chapter, I won’t be going into any of the rest of TFX. A great resource if you’d like to learn more about it is the book Building Machine Learning Pipelines by Hannes Hapke and Catherine Nelson (O’Reilly). ...

Get AI and Machine Learning for Coders now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.