Chapter 4. Document Structure
In this chapter, we leave behind the bits and bytes of the PDF file, and consider the logical structure. We consider the trailer dictionary, document catalog, and page tree. We enumerate the required entries in each object. We then look at two common structures in PDF files: text strings and dates.
Figure 4-1 shows the logical structure of a typical document.

Trailer Dictionary
This dictionary, residing in the file’s trailer rather than the main body of the file, is one of the first things to be processed when a program wants to read a PDF document. It contains entries allowing the cross-reference table—and thus the file’s objects—to be read. Its important entries are summarized in Table 4-1.
| Key | Value type | Value |
/Size* | Integer | Total number of entries in the file’s cross-reference table (usually equal to the number of objects in the file plus one). |
/Root* | Indirect reference to dictionary | The document catalog. |
/Info
| Indirect reference to dictionary | The document’s document information dictionary. |
/ID
| Array of two Strings | Uniquely identifies the file within a work flow. The first string is decided when the file is first created, the second modified by workflow systems when they modify the file. |
Here’s an example trailer dictionary:
<< /Size ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access