Chapter 3. Two Heads Are Better than One: Encoder-Decoder Architecture

We have now discussed in detail the neural language model perspective on predicting the next word, without delving too much into why this is such a valuable thing for models to learn to do. Let’s pick a more concrete and practical task to motivate our next exploration of neural network architecture: text summarization. We begin by considering how humans think about summarizing text. The same working example is then used to introduce the Encoder-Decoder architecture, which is also referred to as sequence-to-sequence (seq2seq). Finally, we offer a few considerations on the use of encoder-decoders and finish with the key takeaways.

How Do Humans Summarize?

Let’s begin by considering how humans approach the task of summarizing text. We will make the example more concrete by using the game of Pictionary. You play Pictionary in a team of two: you randomly select a word that your partner does not see, and you must then draw a picture of the word to enable your partner to guess the word correctly. Figure 3-1 shows a simple example of the steps in the game with the word “bat.”

Regular Pictionary has a time component: your team has a limited time for drawing and guessing, and if your partner fails to guess correctly, you do not pick up a point for your team. But let’s amend the game slightly and create a Summarization version of Pictionary.

Figure 3-1. Simple Pictionary example (source: US Department of the Interior ...

Get Language Models in Plain English now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.