Chapter 12. Image and Text Generation

So far in this book, we have focused on computer vision methods that act on images. In this chapter, we will look at vision methods that can generate images. Before we get to image generation, though, we have to learn how to train a model to understand what’s in an image so that it knows what to generate. We will also look at the problem of generating text (captions) based on the content of an image.

Tip

The code for this chapter is in the 12_generation folder of the book’s GitHub repository. We will provide file names for code samples and notebooks where applicable.

Image Understanding

It’s one thing to know what components are in an image, but it’s quite another to actually understand what is happening in the image and to use that information for other tasks. In this section, we will quickly recap embeddings and then look at various methods (autoencoders and variational autoencoders) to encode an image and learn about its properties.

Embeddings

A common problem with deep learning use cases is lack of sufficient data, or data of high enough quality. In Chapter 3 we discussed transfer learning, which provides a way to extract embeddings that were learned from a model trained on a larger dataset, and apply that knowledge to train an effective model on a smaller dataset.

With transfer learning, the embeddings we use were created by training the model on the same task, such as image classification. For instance, suppose we have a ResNet50 ...

Get Practical Machine Learning for Computer Vision now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.