Learning Deep Learning: Theory and Practice of Neural Networks, Computer Vision, NLP, and Transformers using TensorFlow

Chapter 10 Long Short-Term Memory

In this chapter, we start by diving deeper into the vanishing gradient problem that can prevent recurrent networks from performing well. We then present an important technique to overcome this problem, known as long short-term memory (LSTM), introduced by Hochreiter and Schmidhuber (1997). LSTM is a more complex unit that acts as a drop-in replacement for a single neuron in a recurrent neural network (RNN). The programming example in Chapter 11, “Text Autocompletion with LSTM and Beam Search,” will illustrate how to use it by implementing an LSTM-based RNN for autocompletion of text.

The internal details of the LSTM unit are somewhat tricky, which can make this chapter challenging to get through if you are learning ...

Get Learning Deep Learning: Theory and Practice of Neural Networks, Computer Vision, NLP, and Transformers using TensorFlow now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Learning Deep Learning: Theory and Practice of Neural Networks, Computer Vision, NLP, and Transformers using TensorFlow by Magnus Ekman

Chapter 10

Long Short-Term Memory

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly