Gradient descent uses the partial derivative of the loss or error function in order to propagate the updates back to the neuron weights. Our cost function in this example is the sigmoid function, which relates back to our activation function. In order to find the gradient for the output neuron, we need to derive the partial derivative of the sigmoid function. The following graph shows how the gradient descent method walks down the derivative in order to find the minimum:
Gradient descent explained
Gradient descent algorithm visualized
If you plan to spend anymore time studying neural networks, deep learning, or machine learning, you will ...
Get Learn ARCore - Fundamentals of Google ARCore now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.