9 Backpropagation

The training of deep neural networks (DNNs) requires to efficiently compute L(α), where the vector of parameters α=(αk)Rμ has large dimension μ1; currently for the largest DNNs, μ can be more than half a trillion! Calculating individually all partial derivatives for each one of these μ parameters is time consuming, even for high-performance computers. Therefore, techniques for expediting the calculation of derivatives are valuable tools for training DNNs.

One of the most effective of these techniques is backpropagation, which takes advantage of the fact that L(α) is not an arbitrary function of many parameters. On the contrary, the dependence of L(α) on the parameters α is determined by the compositional structure of ...

Get Mathematics of Deep Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.