Chapter 5. Convolutional Neural Networks and Computer Vision

They. Can. See.

H.

Convolutional neural networks have revolutionized the computer vision and the natural language processing fields. Application areas, irrespective of the ethical questions associated with them (such as surveillance, automated weapons, etc.), are limitless: self-driving cars, smart drones, facial recognition, speech recognition, medical imaging, generating audio, generating images, robotics, etc.

In this chapter, we start with the simple definitions and interpretations of convolution and cross-correlation, and highlight the fact that these two slightly different mathematical operations are conflated in machine learning terminology. We perpetrate the same sin and conflate them as well, but with a good reason.

We then apply the convolution operation to filtering grid-like signals, which it is perfectly suited for, such as time series data (one-dimensional), audio data (one-dimensional), and images (two-dimensional if the images are grayscale, and three-dimensional if they are color images, with the extra dimension corresponding to the red, green, and blue channels). When data is one-dimensional, we use one-dimensional convolutions, and when it is two-dimensional, we use two-dimensional convolutions (for the sake of simplicity and conciseness, we will not do three-dimensional convolutions in this chapter, corresponding to three-dimensional color images, called tensors). In other words, we adapt our network ...

Get Essential Math for AI now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.