Chapter 16

One-to-Many Network for Image Captioning

We have now spent a number of chapters on working with textual data. Before that, we looked at how convolutional networks can be applied to image data. In this chapter, we describe how to combine a convolutional network and a recurrent network to build a network that performs image captioning. That is, given an image as input, the network generates a textual description of the image. We then describe how to extend the network with attention. We conclude the chapter with a programming example that implements such an attention-based image-captioning network.

Given that this programming example is the most extensive example in the book and we describe it after we described the Transformer, it ...

Get Learning Deep Learning: Theory and Practice of Neural Networks, Computer Vision, NLP, and Transformers using TensorFlow now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.