Chapter 1. The Big Picture

What Is TeX?

TeX is a typesetting system. It is a collection of programs, files, and procedures for producing professional quality documents with minimum effort.

TeX's job is to translate the text you type into a beautiful typeset page. The key word here is “beautiful,” and it is a very lofty goal.[1] What I mean by beautiful is that TeX, when presented with several paragraphs of plain text and left to its own devices, produces a remarkably aesthetic page. Despite the fact that TeX may have to contend with multiple fonts and mathematics, it still manages to typeset pages in which each of the following aesthetic principles hold simultaneously:

  • The right margin is justified.
  • Proper justification is achieved without letterspacing.
  • Interword spacing is neither too tight nor too loose.
  • The page is evenly gray.
  • The baselines of multiple fonts are properly aligned.
  • Hyphenation is automatic, if required, and usually correct.
  • Ladders are avoided.

TeX processes documents a paragraph at a time, rather than a line at a time like most other programs. Internally, TeX computes a value called badness for each line of the paragraph. Anything that detracts from the appearance of a line (tight or loose spacing, a hyphen, etc.) increases the badness associated with that line. Every paragraph that TeX produces is optimal in terms of the total amount of badness present. Because TeX searches for an optimal solution, changing the last word of a paragraph can affect the spacing of the first line of the paragraph. After you've gained a little bit of experience with TeX, you'll be able to override any one, or all, of the rules it uses to compute badness, but in most situations you won't want to. I will describe more of TeX's approach to text formatting and how it differs from that of word processors, desktop publishers, and other markup languages in the following sections.

TeX is not a simple program, but a set of programs, tools, fonts, and other types of files. Two programs form the core of the TeX typesetting system. One of them is TeX itself, the program that reads your input files and transforms them into typeset form. The other program is MetaFont, a tool for creating fonts. Producing TeX documents involves a series of steps, including editing the document, running TeX itself, and processing TeX's output in various ways.

Over the years, TeX has been made available on almost every computer platform, so it is probably available for the computer system that you use. Compiling TeX on different systems has been possible, in large part, because TeX is a text formatter and not a word processor. Unlike a word processor, TeX never deals directly with displaying text on the screen or interacting with input from the keyboard (except in a very basic way). These features of an application are typically the most difficult to port from one system to another.

Beyond the technical details that make translation from one system to another possible, Donald Knuth added an important stipulation to the free distribution of TeX: in order for any program to be called “TeX,” it must pass a rigorous test suite. This means that the TeX you use behaves exactly like the TeX I use.[2] This feature has contributed greatly to TeX's success. It means that a large community of TeX users can transparently share documents.

TeX for Beginners

If you are already familiar with TeX, you may find some of the material in this section repetitive. If so, just skim it quickly. This section will help you understand how TeX interprets the things you type into your input file. When you understand the concepts discussed here, you'll be ready to write really, really simple documents in TeX.

Boxes and Glue

Despite the apparent complexity of TeX's job, it uses a very simple metaphor: all typographic elements are boxes. The simplest boxes, individual characters have a set shape defined by the font they come from. There are three parameters that define a box: width, height, and depth. The distinction between height and depth is a bit subtle. When a row of characters is typeset, every character rests on an imaginary line called the baseline. Some characters, like the lowercase “g,” descend below the baseline. The distance from the baseline to the top of a box is its height; the distance from the baseline to the bottom is its depth.

Figure 1.1 shows the character boxes formed by the Computer Modern Roman letters “g” and “h.” The x-y distance of each box is its height and the y-z distance is its depth. The reference point of the box, marked with an r, is on the leftmost edge of the box where the height and depth meet. Characters that have no descenders (no elements that go below the baseline), have a depth of zero. TeX uses the character box metrics, but font designers are free to allow glyphs to extend outside the box (for example, at the top of the “g”).

Figure 1.1 The Letters “g” and “h” inside their boxes.

images

The following paragraph demonstrates how TeX uses the metrics from the physical dimensions of each character to build word, line, and paragraph boxes.

TeX “glues” character boxes together to form words. When boxes are joined, they are always joined so their reference points are horizontally aligned as shown in Example 1.1.[3] Character-boxes (like this) are joined to form words, word-boxes (like this) are joined to form lines, and line-boxes form paragraphs. TeX accomplishes the task of forming a justified paragraph by allowing the glue between words to stretch and shrink a little bit and by occasionally breaking the glue between characters to insert a hyphen. Although the rules are slightly different, TeX builds a page out of vertical boxes (paragraphs, figures, etc.) in an analogous manner.

This is a very generalized overview. In reality, a lot of subtlety is required to capture all of the nuances of typographical appearance.

Control Sequences

A control sequence is a special “word” that you put in your document. These extra words are instructions for TeX, and they do not usually appear in your typeset document. Example 1.1 shows a contrived example of a TeX document that uses several control sequences.

Example 1.1. An Example of a TeX Document
\def\ora{O'Reilly \& Associates}
\font\orafont=grlg10
\parskip=\baselineskip
\parindent=0pt
\pageno=5
This book is published by \ora in
the \textit{Nutshell} series.
\bye

In most macro packages, a control sequence is a backslash followed by a sequence of letters.[4] TeX is case-sensitive, so the control sequence \large is different from \Large (these control sequences switch to large and very large fonts in the LaTeX macro package). Control sequences end with the first non-letter, even if it isn't a space. For example, \parskip0pt is the control sequence \parskip followed by 0pt. This control sequence tells TeX to insert zero points of extra space between paragraphs.

Unless instructed otherwise (with control sequences), TeX builds rectangular paragraphs out of lines of words. Changing fonts, building tables, and typesetting mathematical equations are examples of situations in your document where TeX needs extra information.

The number of control sequences used in a TeX document may seem overwhelming at first. Luckily, every control sequence falls into one of several categories:

Macro control sequences

Macro control sequences associate a name with an arbitrary string of text (including other control sequences). They are interpreted by replacing the control sequence with the text of its definition.[5]

Macro control sequences are the root of TeX's tremendous flexibility. By defining control sequences with meaningful names, like \chapter and \footnote, TeX can present a reasonably simple interface to the user. By redefining those control sequences, the typeset output can be modified without requiring you to retype large quantities of text.

In Example 1.1, the macro control sequence \ora is defined as a shortcut for typing “O'Reilly & Associates.” This is a simple example of how a macro control sequence can be used.

Font control sequences

In Example 1.1, the line \font\orafont=grlg10 creates a font control sequence called \orafont. When \orafont is used, TeX will begin typesetting in the font grlg10. The name of the font, grlg10 in this case, refers to an external file that contains font metric information. Fonts are discussed in Chapter 5, Chapter 5.

Registers

Registers are like variables in a programming language. They hold a single value of a particular type. Many types of values can be stored: numbers (also called “count” values because they are simple counting numbers like 1, 2, 17, or -5), dimensions (also called “lengths”; they are distances like 3.5pt or 2in), boxes, glue, and token lists (an internal representation of your document used by TeX).[6]

If you are unfamiliar with computer programming, think of these registers as place holders. When TeX needs to save a piece of information, like how much space should be inserted between paragraphs, it stores the information in a register. When the information is needed again, in this case when TeX has finished typesetting one paragraph and is about to start another, it can retrieve that information from the register. Registers are usually given names that at least hint at how they are used. This helps people read and modify the rules that TeX uses to typeset documents.

In Example 1.1, \parskip, \baselineskip, and \parindent are dimension registers. The \pageno control sequence is a count register.

There are only 256 registers of each type. The type of information (number, dimension, or token list) that a register can contain is defined when the control sequence is created. Once a variable like \parindent is created to hold a dimension, it can never hold a number or a token list.[7]

Registers may seem unnecessary now that you know about macro control sequences, which can store arbitrary information. However, registers differ from macro control sequences not only in the types of values they can hold, but also in the types of operations that can be performed on them. There is a TeX command called \advance, for example, that can increment the value stored in a register by an arbitrary amount. You can't \advance a macro control sequence.

Built-in commands

A number of control sequences are built into TeX. These “primitive” operations form the basis for all higher-level functionality. There are a wide variety of control sequences of this type. Everything that can be done in TeX can be reduced to a sequence of primitive operations.

There is no way to know, simply by inspection, if a control sequence is one of the built-in sequences or not. Luckily, it doesn't matter very often; it really only matters when you are writing complex macros.

The \font control sequence in Example 1.1 is a built-in control sequence. So is \advance, mentioned above.

The number and kind of control sequences available depends upon the macro package that you are using. (Macro packages are discussed fully in Chapter 4, Chapter 4.) For the rest of this chapter, the default settings of Plain TeX are assumed.[8] There are other macro packages, like LaTeX, Lollipop, and TeXinfo, which have different default values.

Special Characters

In addition to control sequences, TeX reserves several characters for special purposes. Most of them do not occur very frequently in ordinary text, but you must be aware of them because there will be very surprising consequences if you use them incorrectly.

Table 1.1 shows all of the special characters in Plain TeX.[9] Most of these characters are special in other macro packages as well. Font-specific characters are not reserved by TeX, but they don't produce the results you would expect when typeset in Computer Modern because of the way TeX expects fonts to be laid out. Fonts are discussed in detail in Chapter 5, Chapter 5.

Table 1.1. Special Characters in Plain TeX

Character Meaning
# Used for parameter definition in macros and tables
$ Toggles in and out of math mode
% A comment (TeX ignores everything to the end of the line)
& The column separator in tables
~ The active space (an unbreakable space)
_ Marks a subscript (valid only in math mode)
^ Marks a superscript (valid only in math mode)
\ Begins a control sequence
{ Begins a group
} Ends a group
| Produces an em-dash (—) (font-specific)
< Produces an upside down exclamation mark (!) (font-specific)
> Produces an upside down question mark (¿) (font-specific)
" Incorrect for quoted text; use “ and ” instead (font-specific)

It is best to avoid these characters until you are familiar with TeX. If you need to typeset one of these characters, Table 1.2 shows what to put in your document. You should also avoid characters outside the standard printable ASCII character set (characters with ASCII values below 32 and above 126). TeX can be configured to accept characters outside the printable ASCII range, to support non-English languages, for example, but it is not configured to do so “out of the box.” Chapter 7, Chapter 7, discusses the issues of typesetting in different languages.

Table 1.2. Typeset Special Characters

To Get Put This in Your Document
# \#
$ \$
% \%
& \&
~ \~
{ $\{$
} $\}$
< $<$
> $>$
| $|$
_ $\underbar{\hbox{\ }}$
^ $\hat{\hbox{ }}$
\ $\backslash$

Some of the suggestions in Table 1.2 will not always produce exactly what you want. The entry for “~” really produces a tilde accent, not a tilde character and the entries for “{” through “\” all get the actual characters from TeX's math fonts. The Computer Modern text fonts don't include these characters so it is necessary to get them from the math fonts. However, if you are using PostScript or other kinds of fonts, you may very well have curly braces, angle brackets, underscores, etc. in the font. You can access these characters directly with the \char primitive. I strongly recommend that you always define macros for this purpose, so that you can easily switch to some other method if you change fonts. Introducing \char primitives makes your document less portable. To use the \char primitive, simply put the decimal ASCII value of the character that you want to print. For example, this book is typeset with PostScript fonts that include a backslash character at position 92, so I defined \bs to print a backslash like this:

\def\bs{\char92\relax}

Using \relax after the decimal value assures that TeX won't get confused if I put a backslash in front of other digits like this \bs300dpi.

The braces “{” and “}” are a very special case. TeX uses curly braces to delimit arguments and make changes (like switching fonts) that are local to a small section of the document. These are called grouping characters in TeX jargon. For example, to typeset a single word in boldface, you put {\bf word} into your input file. The \bf control sequence switches to boldface type, and the curly braces localize the effect to the single word word. As a result, it is very important that you avoid braces (except when you use them as delimiters) and that you carefully match all opening and closing braces. One of the most common errors in TeX is to forget a closing brace.

One last special character is the blank space. For the most part, TeX doesn't care how you space your lines of text. Any space that occurs is simply a word break to TeX, and inserting multiple spaces doesn't influence how TeX typesets the line. TeX also considers the end of a line an implicit space. If you are trying to control the layout of your input text and want to break a line without introducing a space, place a comment character (% in most macro packages) at the very end of the line. If the last character of a line is the comment character, TeX ignores the line break and all the leading spaces on the following line. This allows you to use indentation to make your input file more readable. \goodbreak

For example, the following lines in your input file:

“This                      is some ex
      ample text.”

and this line:

“This is some example text.”

both produce:

“This is some example text.”

in your typeset document.

Text Formatting Versus Word Processing

For many people, writing documents with a computer implies using a word processor like WordPerfect or Microsoft Word. The word processing program controls every aspect of what you do: it's where you type your text, where you see what it will look like, where you print, and where you do everything else. Some of these environments, the so-called WYSIWYG (what-you-see-is-what-you-get) programs, attempt to show you what the printed document will actually look like while you edit it.[10]

If WYSIWYG environments are what you're used to, or what you expect, TeX's approach may seem very strange at first because TeX is a text formatter, not a word processor. Instead of trying to show you what your document will look like while you type, TeX expects you to do all the typing somewhere else, and then pass it a source file containing all of your text plus control sequences that tell TeX how you'd like it printed.

In The Psychology of Everyday Things [dn:psyeveryday], Donald Norman describes these two modes of interaction as first person and third person. First person interaction provides the user with the ability to directly manipulate the elements of a task, whether it's flying an airplane or resizing text. Third person interaction, on the other hand, occurs where the user is expected to type commands to the computer in an appropriate command language; the shell prompt is a good example of third person interaction.

Is first person interaction really better? Well, it depends. Norman writes, “Although they [WYSIWYG environments] are often easy to use, fun, and entertaining, it is often difficult to do a really good job with them.” The problem which arises is that the user is required to do the task, and he or she may not be very good at it. Third person systems are better when the computer program can be trusted to do a better job of the task than the user.

Is TeX really better than a word processor? Well, it depends on the task and the person doing it. TeX probably isn't better for designing one page flyers with lots of fonts and graphics (although I've done it). But for longer documents, TeX offers all of these advantages:

  • TeX has a precise understanding of the rules of typesetting, so you don't have to.
  • Predefined styles allow experts to extend (or bend) the rules of typesetting without burdening the user.
  • Journals and magazines can achieve consistency of appearance much more reliably because the consistency is in the style files.
  • TeX runs on cheap systems (old PCs with monochrome monitors and no graphics capability, for example).
  • Although complex and difficult to learn, TeX offers incredibly flexible table construction tools.
  • Few, if any, word processors can provide running headers and footers as flexibly as TeX. Imagine the task of writing a dictionary: the left and right hand side headers change on each page, each time a new entry is added.
  • TeX offers flexible bibliography layouts.
  • TeX is extensible. Its behavior can be modified by defining new commands and environments without changing the actual program.

There are some other good reasons to separate document creation from text formatting:

  • Documents are portable. Because the source files are just plain text without any nonprintable characters, they can easily be copied from one system to another.
  • TeX is portable. TeX runs everywhere. You can process your documents with TeX on unix workstations; personal computers running MS-DOS, OS/2, and Windows; IBM mainframes running VM/CMS; workstations running VAX/VMS; Macintoshes; Amigas; Ataris; and just about every other computer with a reasonable amount of memory. And the typeset output will be the same! This adds another dimension of portability to your documents.
  • TeX is free. You can afford to have it on every system you use. Several sources of TeX software are listed in the preface of this book.
  • TeX allows you to separate markup and output. Logical divisions in the text (chapters, sections, itemized lists, etc.) are identified by control sequences. An entirely different page layout can result from simply changing the definition of a few control sequences.

    This means that the look of your documents can be changed (to fit the style guidelines of a particular journal or publisher, for example) without changing the text of your documents at all.

  • Plain text files are easier to manipulate with other tools than specially encoded word processor files are. This means that you can use standard utilities on your documents: revision control, grep, shell scripts, etc. This is a less common practice in non-unix environments, but it is still convenient.
  • You can continue to use your favorite editing tools. The extent to which you find this advantageous is dependent, naturally, on the extent to which you have a favorite editing program. Nevertheless, this can be a considerable advantage. For example, users familiar with emacs can continue to rely on all of the features they are used to, including interactive spellchecking, access to online services like Webster's dictionary, customized editor macros, and convenient services like reading mail.
  • You get better looking output. TeX gives you far more precise control over the placement of elements on the page than most word processing programs. And TeX is very intelligent about typesetting (paragraph breaking, kerning, ligatures, etc.).

What About Desktop Publishing?

Desktop publishing systems like Ventura Publisher and Aldus PageMaker are noted for their ability to incorporate multiple fonts and graphics into a document. As word processors become more sophisticated, the line between word processing and desktop publishing is becoming blurry.

This book shows you many ways that TeX can provide access to the same sophisticated features. TeX can incorporate pictures and figures in a number of ways (just take a look at the way I've wrapped text around this kiwi),[11] and TeX can use almost any font that another program can use---it can certainly use all of the popular types of fonts. Like typical word processors, desktop publishing programs force you to use a single application to create your entire document, and they lack the flexibility required to combine just the pieces that you want. All of the advantages of text formatting over word processing also apply to desktop publishing programs. I'll grant, however, that WYSIWYG environments are easier for first-time users. But that doesn't make them better, it just makes them more popular.

What About troff?

troff is the “other” text formatting system. If you've ever tried to read a unix reference page without formatting it first, you've seen troff. For a long time it was distributed as part of all unix systems. Now it is more likely an extra-cost option. The Free Software Foundation's groff processor is a free, troff-compatible system.

On the surface, it is easier to compare TeX and troff than to compare TeX to the other document preparation systems described in this chapter. In reality, the differences are subtle: TeX and troff have the same general paradigm; they are equally powerful to a large extent, and both have advantages and disadvantages.

troff is similar to TeX in many ways. Like TeX, troff processes a plain text file and produces a typeset document. TeX and troff differ in the way that formatting information is inserted into the text. TeX uses control sequences, where troff uses a mixture of control sequences[12] and “dot” commands (lines of text that begin with a period and contain typesetting commands).

Although I am inclined to say that troff documents are far more cryptic than TeX documents, I am certain that there are plenty of troff users who would disagree (strongly).

Objectively, TeX handles mathematical typesetting far better than troff and probably has better support for multilingual documents. The nroff processor, which produces plain text output from a troff document, at one time provided a strong argument in favor of troff for typesetting documents required in both typeset and plain text formats. However, the TeXinfo macro package for TeX has largely defeated that argument. In troff's defense, TeXinfo is very, very different from other TeX macro packages, so it really is necessary to plan ahead and learn a very different set of macros to typeset both plain text and typeset documents with TeXinfo. Chapter 10, Chapter 10, discusses this issue further.

In my experience, there is more free support for TeX than troff. TeX is supported by a large community of users actively producing new, useful document-preparation formats, styles, and tools. In addition, TeX is more widely available than troff: a TeX port exists for almost every practical computer system, whereas troff is still mostly confined to unix systems (although the Free Software Foundation's groff package has been ported to similar systems like MS-DOS, Windows NT, and OS/2).

The following fragments show a side-by-side comparison of TeX commands, on the left, and troff commands, on the right:

\begin{figure}                    .(z
 \begin{center}                   .hl
\hrule                         Text to be floated.
   \vspace{8pt}                   .sp
   Text to be floated.            .ce
   \hrule                         .hl
   \caption{Example figure...}    Figure \*[fig]: Example figure...
   \vspace{8pt}                   .)z
 \end{center}
\end{figure}

Both examples produce a floating figure that looks like this:

Figure 1.2 Example figure produced by both TeX and troff

images

What About SGML?

The Standard Generalized Markup Language (SGML) is a document description language. SGML aims to separate the content of a document from its presentation. In other words, SGML identifies the features of a document (chapter headings, paragraphs, etc.) without specifying how they are to be presented.

This means that all SGML documents must interact with a document formatter of some sort. Many people are finding that TeX is a natural choice when selecting a document formatter for their SGML environment. In fact, LaTeX already provides many SGML-like commands because it was designed to separate markup from presentation. One of the specific goals of an effort (currently underway) to develop a new version of LaTeX is to make SGML and LaTeX work together easily, cleanly, and efficiently. For more information about the goals of this project and information about what you can do to help, please read The LaTeX3 Project [l3:project].

How TeX Works

A functioning TeX system in which you are producing documents of medium size and complexity is really a collection of tools and files that are related to each other in well defined (if somewhat subtle) ways.

One of the fundamental goals of this book is to shed light on these relationships and allow you to put together a TeX system that quickly and easily does the jobs you need to accomplish.

TeXing a Simple Document

This section briefly describes what you need to know about how TeX processes a simple document (that is, one that does not contain complex document elements like a table of contents, indexes, bibliographies, etc.). Figure 1.3 shows how the standard TeX tools fit together at the most basic level.

Figure 1.3 A high-level view of TeX

images

Figure 1.4 expands on Figure 1.3, showing additional tools and files that you'll often need to use.

Figure 1.4 High-level view of TeX including more detail

images

Editing your document

The most tangible and important part of your TeX system is your document. This is the file (or files) in which you write down what you want to typeset with TeX. In addition to the actual text, you include control sequences to describe how you want the final text to appear (size, font, justification, etc.). The section “the section called “What Is TeX?”” earlier in this chapter tells you briefly what goes into your document file.

The most common way to create a document is with an editor, which can provide you with a number of features to make typing TeX documents easier. For example, an editor can help you insert common control sequences automatically, run TeX automatically (from within the editor), and keep you from making common mistakes (like typing a left brace, but not the matching right one). These features and how they work in editors including GNU Emacs, aucTeX, and Multi-Edit are described in Chapter 2, Chapter 2.

Running TeX

Once you have prepared your document file, it is time to run the TeX program itself. This may not be as easy as it sounds. You need to determine the name of the TeX program at your site, to make sure all of the files TeX needs are available to it; you also need to specify the correct command-line options. Chapter 3, Chapter 3, describes everything you need to know.

TeX may find errors in your document (places where TeX doesn't understand the instructions you used; not spelling or grammatical errors, unfortunately ;-). Chapter 3 also describes the most common errors you're likely to make and gives advice for interpreting error messages.

If TeX is successful in formatting your document (i.e., your document doesn't contain any errors), it produces a DVI (DeVice Independent) file. The DVI file is a device-independent representation of the typeset output of your document. DVI files are transitory. Although there are a few programs that can manipulate them (to rearrange the order of the pages in the output, for example), most of the time you will immediately transform them into something else---either printed output or previewed output on the screen. (See the following section “the section called “Using macros”.”)

Using macros

The control sequences that you insert in your document are defined by a macro package.[13] Macro packages are collections of TeX commands (macros) that extend TeX. Macro packages are frequently stored in format files, specially compiled versions of the macro package. The iniTeX program interprets all of the control sequences in a macro package to create a format file that TeX reads when it runs.

Many macro packages are particularly effective in implementing particular document styles or supporting particular types of writing. Two of the most common are Plain TeX and LaTeX. Chapter 4, Chapter 4, describes Plain TeX, LaTeX, and a number of other macro packages that extend the power and ease of TeX.

Using fonts

One of TeX's strengths is its support for a myriad of predefined fonts and its ability to let you create fonts of your own. In addition to your document and the format file, when TeX runs it needs font information as well. This is provided in the form of a set of TFM (TeX Font Metric) files that tell TeX the size and shape (roughly speaking, at least) of each character, as well as some other information about how characters are related to each other.

Historically, the MetaFont program was the way a TeX user created fonts. Like TeX itself, MetaFont is about ten years old. Ten years ago, it was a unique program that was indispensible for creating the type of output TeX produces. Today there are many competing font technologies, all of them more common than MetaFont, and MetaFont's role is diminishing. Many people use TeX today without ever using MetaFont at all. Nevertheless, MetaFont still has some importance, and we describe how to run and use it in Chapter 11, Chapter 11. Because the standard fonts that come with TeX are still the fonts produced by MetaFont, it will also be mentioned elsewhere in this book.

If you are writing complex documents, you may need to learn a lot about fonts and how to define and use them. Chapter 5, Chapter 5, tells you everything you need to know, including information about the New Font Selection Scheme, a new way of describing and selecting fonts in TeX.

Previewing or printing TeX documents

After you have produced a DVI file, as described in the section “the section called “TeXing a Simple Document”,” later in this chapter, you run another program (generically called a DVI driver) to translate the DVI file so you can either preview or print your document. Driver programs need your DVI file and some collection of fonts (usually PK (packed) font files).[14] Many different kinds of fonts are described in Chapter 5.

Chapter 8, Chapter 8, tells you how to print your documents and deal with the problems you may encounter using bitmapped or scalable fonts, printing pictures and figures, and other printing issues.

Often you will want to look at your document before you actually print it. Because TeX is not a WYSIWYG system,[15] you cannot do this until you have processed the DVI file. There are a number of good previewing products, including xdvi, dvimswin, and dviscr, that let you look at your processed document on the screen before you decide whether to print it. See Chapter 9, Chapter 9, for complete information.

TeXing More Complex Documents

This section briefly describes how TeX processes a more complex document (that is, one that includes elements like a table of contents, indexes, bibliographies, etc.).

Many TeX formats implement sophisticated cross-referencing schemes. Cross references may sound rather esoteric, but they occur frequently. Tables of contents, figure and table numbers, indexes, and bibliographic references are all flavors of cross referencing.

Cross references make your document more complex because they require more information than is immediately available when TeX initially processes your document. For example, if you refer to a figure which occurs later in the document, TeX has no way of knowing what figure number to insert into the text at the point of the reference. These are called forward references.

TeX macro packages that support cross referencing overcome the difficulty of forward references by requiring you to process your document more than once. Each time your document is processed, the necessary reference information is stored into a separate file. If that file exists when you process your document, the information saved last time is loaded so that it is available this time. The practical implication of this functionality is that documents with cross references frequently have to be processed twice. Occasionally, you may have to process a document three times. This occurs when the inserted reference causes TeX to format a paragraph differently, which in turn causes TeX to change a page break.[16] Because most changes are incremental while revising a document, this is normally only an issue the first time you process a document.

The following sections describe the LaTeX methods for constructing a table of contents, figure references, an index, and a bibliography. LaTeX is used in this example because it is a very common macro package and is typical of the way macro packages provide these features. Similar mechanisms exist in most formats, except Plain TeX.

Figure 1.5 shows the relationships between many of the components described in the following sections. LaTeX creates several sorts of auxiliary files depending on the kind of cross references required by your document and the style files you use. These auxiliary files may be modified (and others may be created) by other sorts of post-processing programs (like MakeIndex for constructing indexes or BibTeX for constructing bibliographies). LaTeX uses these auxiliary files, if they exist, to update your document when it is processed again.

Figure 1.5 TeXing a More Complex Document

images

Building a Table of Contents

A table of contents is the simplest form of cross reference. In LaTeX, you request a table of contents by inserting the \tableofcontents command wherever you want it to appear in your document. If you request the table of contents at the end of your document rather than the beginning, your document can be printed with only one pass through TeX.

LaTeX uses a file with the same name as your document and the extension .toc to hold the table of contents entries. You can control the level of detail in your table of contents by setting the \secnumdepth counter. A value of zero includes only chapters; one includes chapters and sections; two includes chapters, sections, and subsections, and so on.

The LaTeX commands \listoftables and \listoffigures perform the same functions as \tableofcontents for lists of tables and figures. They use external files with the extensions .lot and .lof, respectively. As with the table of contents, your document can be correctly formatted in one pass if the \listoftables and \listoffigures commands are placed at the end of the document.

Figure References

Figure references are a special case of LaTeX's cross referencing mechanism. The LaTeX command \label{string} creates a referent. You refer to the label with the command \ref{string}. In normal body text, the label refers to the current section or subsection. In a figure or table environment, the label refers to that figure or table.

If your document contains no forward references (if all \label commands occur before the \ref's that refer to them) then it can be formatted in one pass. Otherwise, TeX will have to be run two or three times to make all of the references correct.

Indexes and Glossaries

Indexes and glossaries differ from the preceding forms of reference in that they must be processed by a separate program. In general, this is true regardless of the macro package or format you use. An external program is required because indexes and glossaries must be alphabetized, and in indexes, consecutive page numbers have to be converted into ranges, and so on.

Bibliographies

LaTeX works in conjunction with another program, called BibTeX, to provide a flexible, convenient way to construct bibliographies. The \cite commands allows you to refer to other documents in much the same way that the \ref command allows you to refer to other portions of the same document.

You make a citation by placing the command \cite{string} where you wish the citation to occur. The string is a key that refers to the document in your bibliography database that you wish to cite. Example 1.2 is a typical entry in a bibliography database. It describes Knuth's classic book The TeXbook [kn:texbook]. The key for this entry is “kn:texbook.”

Example 1.2. A typical bibliography database entry
@Book{kn:texbook,
  author    = "Donald E. Knuth",
  title     = "The {TeX}book",
  publisher = "Addison-Wesley",
  year      = 1989,
  edition   = "Fifteenth",
  isbn      = "0-201-13447-0"
  note      = "Paperback ISBN: 0-201-13448-9"
}

Each entry in the database consists of a type (book, article, magazine, etc.), a key, and a number of fields. The number and names of the fields depend on the type of entry. The database is simply a plain ASCII file containing any number of entries. You can have multiple databases.

These are the commands you use, in addition to \cite, to include a bibliography in your document:

\bibliographystyle{plain}
\bibliography{textools,refbooks}

The \bibliographystyle command tells BibTeX how to format the bibliography, and the \bibliography command identifies which bibliographic databases contain the citations that you have made. The “plain” style of bibliography is selected, and the textools and refbooks files contain the bibliographic information for the documents cited. Document styles can be used to alter the format of citations in your text. The default extension for bibliographic styles is .bst. The default extension for database files is .bib.

LaTeX places citations and bibliography information into the .aux file. BibTeX reads the .aux file and constructs a bibliography, which it places into a file with the extension .bbl, using the entries you cited and the bibliography style you selected.

Special Things

Sometimes, producing a complex document requires the ability to interface with objects outside of TeX (pictures or figures created by high-end graphics packages, special features of a particular printer, etc.). To support this kind of communication, TeX provides a control sequence called \special. The arguments passed to the \special command are written directly to the DVI file for the DVI driver. It is the responsibility of the DVI driver to handle them. DVI drivers typically ignore \special commands that they do not recognize.

You will find \special commands of various kinds described throughout this book, particularly when discussing color typesetting in Chapter 4, Chapter 4, and graphics in Chapter 6, Chapter 6.

[1] Before I proceed, the notion of beautiful in this context needs some explanation. Several people have pointed out that the logo type used by many TeX-related programs (including TeX itself) is intrinsically ugly. These same folks argue that a sentence like “TeX is designed to typeset beautiful pages” is self-contradictory because it begins with such an ugly construction. Obviously, TeX can't prevent you from typesetting ugly things. But TeX can typeset beautiful things too. We at O'Reilly & Associates think that this book, typeset completely in TeX, is an excellent example.

[2] This is not a whole-truth. Implementors of TeX may make some system-dependent alterations as long as the resulting program still passes the test suite; so our TeXs may not behave exactly the same way. They will, however, produce identical documents given identical input (unless the input relies on system-dependent features not available in both TeXs, naturally. ;-)

[3] This is subtly different from saying that they are joined at the baseline. There are TeX commands which can change the position of the reference point in a box, whereas the baseline is an imaginary line that depends solely on the shape of the character.

[4] Technically, it's any character defined to be in the “escape” category followed by any sequence of characters defined to be in the “letter” category or a single character in the “other” category.

[5] Actually, macro expansion differs from pure textual replacement in a number of technical ways, but they aren't important here.

[6] Technically, several other kinds of values are stored this way as well, but they are less common and won't be discussed in this book at all.

[7] Most control sequences can be redefined to hold different kinds of values, but they can never hold different kinds of values at the same time. A dimension register can be redefined to hold tokens, for example, but then it can't hold dimensions anymore (unless it is redefined again).

[8] Plain TeX is the name of a particular macro package. I selected it for the purpose of example in this chapter because it is always installed with TeX. Most of what follows in this chapter is true in other macro packages as well, but some of the details are different. See Chapter 4 for more information.

[9] All of these special characters are configurable, but most macro packages use the Plain TeX defaults.

[10] TeX pundits, and other folks who have been frustrated by the limitations of these environments, frequently refer to this as WYSIAYG—what you see is all you get.

[11] TeX doesn't do this sort of thing automatically, but it isn't hard to do. Why the kiwi? It was on my business card at the time.

[12] Although it has a very different notion of what constitutes a control sequence.

[13] Well, actually, they're TeX primitives, are defined by a macro package, defined in a file loaded by a macro package, or defined in your document.

[14] Some drivers may also benefit from loading the TFM files used to create your document.

[15] Textures for the Mac and Scientific Word offer WYSIWYG-like environments, but that's not the point ;-)

[16] With extreme cleverness or extreme bad luck you can create a document which will never format correctly.

Get Making TeX Work now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.