Chapter 12. Bibliographies, Indexes, and Glossaries
Bibliographies and indexes are typically difficult to incorporate into a document. Bibliographies have stringent, but varying, presentation requirements (the MLA wants bibliographies to look one way, the Association of Computing Machinery wants them another, etc.). Indexes don't vary that much, but they are tedious to put together.
BibTeX provides a powerful mechanism for handling bibliographies and citations. Tib is another bibliography package for TeX. The MakeIndex program helps manage the construction of one or more indexes for a document. Glossaries are also constructed with the MakeIndex program.
BibTeX
What's wrong with doing bibliographies by hand? Two things. First, it is tedious to typeset each bibliography entry according to the strict requirements of the publisher. Chances are, you'll have to look up the requirements each time, and you're bound to make mistakes. Second, no matter what field you work in, it's likely that you'll cite some of the same articles and books in more than one publication. Computers are supposed to reduce effort, not replicate it.
BibTeX provides a powerful mechanism for dealing with bibliographies in a mechanical way, considerably reducing effort on your part. Rather than formatting each bibliography entry, you build a database of bibliography information. Each time you want to make a citation, you simply use BibTeX to build the bibliography from your database into your document. That's easy to do, as you'll see in a few minutes.
The idea of a bibliography database is introduced in Appendix B of LaTeX: A Document Preparation System [ll:latexbook] and is described in more detail in \BibTeX{}ing [op:btxdoc] and Designing BibTeX Styles [op:btxhak]. The purpose of this chapter is to familiarize you with the concepts of a bibliography database and to describe many of the freely available tools for manipulating databases. It is not intended to replace any of the preceding documents.
How BibTeX Works
The LaTeX command \cite inserts citations into your document.[115] You use a short key to identify the publication you cite. For example, in this book, when I want to cite The TeXbook [kn:texbook], I use the command FIXME:. The string kn:texbook is the key. When you build your bibliography database, you assign a key to each document in the database.
When LaTeX processes your document, it stores information about the documents that you cite (including the key for each document) in the AUX file. The following commands identify how the bibliography should be formatted and what bibliography databases contain the publications you cite. LaTeX also writes this information to the AUX file:
\bibliographystyle{abbrv} \bibliography{texpubs}
BibTeX examines the AUX file and extracts the appropriate entries from the bibliography database. It then formats those entries according to the bibliography style that you specify. The bibliography style (stored in a plain text file with the extension BST) tells BibTeX exactly how each entry should be formatted. After formatting the entries, BibTeX writes a BBL file that LaTeX incorporates into your document (at the place where the \bibliography command occurs) the next time you process it.
Sometimes it seems confusing to use LaTeX to create a bibliography. If LaTeX keeps warning you about “unknown citations” for documents that you know are in the database, try the following: run LaTeX, then BibTeX, then LaTeX, and then LaTeX again.
The first time you run LaTeX, it writes the citation keys to the AUX file (but it doesn't know what publications they refer to). BibTeX writes the BBL file, which includes the printable, formatted bibliography and information about what publication corresponds to each citation. The second time you run LaTeX, it still doesn't know what publication each citation refers to because it hasn't seen the BBL file yet (bibliographies are usually at the end of a document). During the second pass, LaTeX writes the citation referents to the AUX file. Finally, on the third LaTeX pass, it knows what each citation refers to, so it can typeset the citations as well as the bibliography!
Building a Bibliography Database
A bibliography database is a plain text file that contains information about a collection of publications. Bibliography databases generally have the extension .bib. Example 12.1 is an example of a single database entry. This entry is from a database of TeX-related publications that I put together while writing this book.
Example 12.1. A Sample BibTeX entry
@book{kn:texbook, author = "Donald E. Knuth", title = "The {TeX}book", publisher = "Addison-Wesley", year = 1989, edition = "Fifteenth", }
The entry in Example 12.1 describes The {TeX}book by Donald Knuth. The key for this entry is kn:texbook.
Note
All of the keys in a BibTeX database must be unique. If you use multiple databases for a single document, all of the keys in all of the databases must be unique.
In database jargon, the bibliography database contains a collection of records describing publications. There may be several types of records in the same database. Each record contains several fields, some of which are required and some optional. The required fields vary according to the type of record.
In English, this means that each entry in the database describes a specific type of publication (book, article, technical report, etc.). Every publication is described by its characteristics. For example, books have a title, an author or editor, a publisher, and a year of publication. Some books also have a publisher's address, a volume or number, a series, edition, or month of publication (or some combination of these elements). These characteristics are called fields, and they are identified by their name.
Database entries
This is the general structure of an entry in a bibliography database:
@type{key, field1 = "value1", field2 = "value2", . . . fieldn = "valuen" }
Always enter complete bibliography information in mixed case. Never abbreviate or set field values in all upper or all lowercase, even if the bibliography style that you most frequently use specifies, for example, that book titles appear in uppercase or that only the author's first initial should appear. BibTeX will take care of formatting the entry according to the style. If you store incomplete information in the database, BibTeX can't work correctly if you change styles.
Entry types
Table 12.1 shows the required and optional fields for article and book entries. Similar lists exist for the other standard entry types: booklet, conference, inbook, incollection, inproceedings, manual, mastersthesis, phdthesis, proceedings, techreport, unpublished, and a catch-all miscellaneous type.
Fields that are neither required nor optional are ignored. Therefore, you can and should associate arbitrary information about a publication in its entry. Abstracts and keywords, for example, are two additional pieces of information that you might keep for some publications. They can be stored in abstract and keyword fields in each entry, even though it is unlikely that they will ever occur in a bibliography.
Note
The types of records that are valid, and the required and optional fields they contain are determined solely by the bibliography style. There is nothing in the BibTeX program that makes book and article entries more legitimate than reptile or cartoon entries.
Abbreviations
The database entry structure that I've shown isn't entirely accurate. In my example, every field has a quoted string value. The truth is, every value is either a quoted string or an abbreviation or a number. An abbreviation is created with the @string command. Typically, @string commands are placed at the top of the bibliography database. For example, the following command defines ora to be an abbreviation for O'Reilly & Associates.
@string{ora = "O'Reilly & Associates, Inc."}
The months of the year should always be specified with abbreviations so that bibliography styles can redefine how they appear in the bibliography (for this reason, three-letter abbreviations for the months are defined in the standard styles---you don't have to define them yourself). The names of journals that you cite frequently are also obvious candidates for abbreviation.
Preamble
Sometimes it is helpful to define TeX control sequences in a bibliography database. BibTeX provides a @preamble entry for this purpose.
Consider the following example, paraphrased from \BibTeX{}ing [op:btxdoc]: You have a database which contains entries for each volume of a two-volume set by the same author. It happens that Volume One has been reprinted, so it has a more recent date than Volume Two. The standard styles sort by author and then date, so as it stands, the bibliography would list Volume Two before Volume One.
To correct this problem, you could specify the dates for Volume One as:
year = "{\noopsort{1990a}}1992"
and for Volume Two:
year = "{\noopsort{1990b}}1990"
BibTeX will sort “1990a” before “1990b,” so they will appear in the correct order, and the following definition for \noopsort will simply discard its argument, so nothing extra will appear in the bibliography:[116]
\def\noopsort#1{}
The best place to put this definition is in the database that uses it (so that it will always be present when that database is used). The @preamble command simply copies its argument to the top of the BBL file, so this definition at the top of the database will do exactly what we want:
@preamble{"\def\noopsort#1{}"}
Comments
Anything that does not appear inside a @type{} command is a comment. However, many programs that manipulate bibliography databases will misplace comments appearing before or after an entry if the entries are reordered.
BibTeX includes a @comment entry for backwards compatibility with older systems. Unlike TeX, the percent sign (\%) is not a comment character in BibTeX.
Special characters
Inserting TeX control sequences (to form accented characters, for example) into a bibliography entry requires special care. Some styles specify that entries should be shifted to upper or lowercase, and shifting the case control sequence names would make them different.
BibTeX is aware of the case sensitivity of TeX control sequence names and will not change them. To specify accents or other special characters, always enclose them in { and } braces. The same treatment should be given to portions of an author's name that should remain in lowercase even if the rest of the name is shifted to uppercase.
Bibliography Styles
BibTeX styles are really programs written in a simple but powerful stack-based language and interpreted by BibTeX.[117] Don't confuse bibliography styles (BST files) with LaTeX styles (STY files); they are unrelated. Although a complete description of the BibTeX language is not presented here, a short example will help give you a sense of the language.
Each BST file defines a number of functions. The highest level functions determine how each entry is formatted: when BibTeX needs to format a “book” entry, it executes the book function, which must be defined by the BST file in use or an error will result.
Let's consider part of the task of formatting a book entry. These code fragments are from the standard bibliography style plain.bst. When the book function is ready to output the book title, it calls the format.btitle function, shown here:
FUNCTION {format.btitle} { title emphasize }
This function places the title field on top of the stack and calls emphasize:
FUNCTION {emphasize}
{ duplicate$ empty$
{ pop$ "" }
{ "" swap$ * "" * }
if$
}
This is what emphasize will do when the title is not an empty field:
- Duplicate what is on the top of the stack.
- Test the value on the top of the stack. The empty$ function removes the top value from the stack and places a boolean “true” value there if what it removed was an empty field, and a “false” value otherwise.
- Push { pop$ “” } onto the stack.
- Push { “<emphasis>” swap$ * “</emphasis>” * } onto the stack.
- Test the condition. The if$ function takes three values from the stack: a boolean value, something to do if that value is true, and something to do if that value is false. In this case, the title is not an empty field, so the value is false. The { pop$ “” } value is discarded, and { “<emphasis>” swap$ * “</emphasis>” * } is evaluated.
- Push {\em onto the top of the stack.
- Swap the top two items on the stack. Now \em is below the book title on the stack.
- Concatenate the top two items on the stack. Now the top of the stack holds the value \bs em book title.
- Push } onto the top of the stack.
- Concatenate the top two items on the stack. Now the top of the stack holds the value {\ttopenbrace\bs em {book title}}}.
The resulting value left on top of the stack when emphasize is finished is the TeX code required to print the book title with emphasis.
Special-purpose styles
A number of styles have been written to allow you to develop special-purpose bibliographies. A few of them are listed here:
- bibunits supports multiple bibliographies in different sections of a single document.
- chapterbib supports separate bibliographies in each chapter of a single document.
- makebst asks a number of questions about the bibliography style you need and constructs an appropriate BibTeX style.
Bibliography Database Tools
There are a lot of programs designed to help you extract information from bibliography databases, sort the entries, build subset-bibliographies that contain only some of the entries from a larger bibliography, and enter or edit information in an existing or new database. A lot of these programs are written in unix shell script languages and rely on existing text-processing tools like awk, sed, sort, and grep. Of course, many of these text-processing tools have been ported to other operating systems. Most of the utilities written in shell script languages can be modified to work in other environments by an ambitious individual.
bibsort
The bibsort shell script reorders the entries in a bibliography database into alphabetical order by entry key name. @string commands are reordered by macro name. This program cannot deal correctly with comments that appear outside of an entry. These comments are always associated with the preceding entry, which is frequently incorrect.
Consider carefully before you reorder the entries in a database. BibTeX places some restrictions on the available orderings. Cross references, for example, must appear before the entry to which they refer.
biblook
biblook is an interactive program that searches rapidly through a bibliography database for key words in specified fields. Compound conditions (using “and” and “or”) can be specified. The entries located by a search can be saved into a separate file.
To use biblook, you must preprocess the database with bibindex, which builds a binary index of the entries. Differences in case are removed, TeX control sequences are stripped out, and non-alphanumeric characters are removed. This increases the likelihood of correct matches.
bibclean
bibclean is a syntax checker for bibliography databases. Running bibclean before using bibsort, biblook, or any other programs described here can eliminate a lot of the problems these programs may encounter. Although bibliography databases are plain text files with a very loose structure, some tools are more easily confused than others.
bibclean identifies possible problems in the database and pretty-prints it in a standard way. The following formatting changes are made by bibclean:
- The structure of each database entry is made consistent with respect to the following criteria: the “@” sign that begins the entry type is moved into column 1; each line in the entry is changed to contain exactly one “field = value” pair; and the closing right brace is placed in column 1 on a line of its own. Additionally, outer parentheses are converted into braces; tabs are expanded into blanks; hyphens in a sequence of pages are converted into en-dashes; and month names are converted into standard abbreviations.
- Long string values are split at a blank and continued on the following line. The continuation lines are indented.
- Individual names in the author and editor fields are normalized. A single space is placed after periods that separate initials, and all entries are converted into “first-name middle-name last-name” form (instead of “last-name, first-name” form, for example).
- The checksums of ISBN and ISSN numbers are verified.
- Uppercase letters that appear outside of braces are enclosed in braces to prevent them from being erroneously shifted to lowercase by some bibliography styles.
- Text outside of entries is not changed. Entries are separated by a single blank line.
citetags
When documents must be transmitted in “source” form (meaning that the actual TeX files will be shipped around), it's generally unnecessary to include entire bibliography databases. In these cases, it would be more convenient to send only the bibliography entries that are actually used.
The citetags program extracts the citations from an AUX file. This list of citations can be passed to the citefind program to build a small bibliography database of just the required entries.
citefind
citefind processes a list of citations and a list of bibliography databases and writes out a new database containing just the entries required to match the citations present. This provides a minimal database that can be shipped with the document if the TeX files must be processed on another computer.
bibextract
Given a list of fields and values (specified as regular expressions), bibextract creates a new database containing only the entries that match one or more of the specified values in one or more of the specified fields. The necessary @string and @preamble commands are also included in the new database.
lookbibtex
lookbibtex offers the same features as bibextract. However, lookbibtex is written in Perl rather than a shell script language.
bibdestringify
As the name implies, bibdestringify replaces all @string macros in a database by their textual expansions.
edb
edb, or the “Emacs Database” is a powerful database programming system built on top of GNU Emacs Lisp. edb has been used to write a database-editing mode for BibTeX databases. It can be extended to handle new entry types.
Emacs BibTeX mode
BibTeX mode is a mode for editing BibTeX databases in emacs. It provides some template expansion and alignment features, but is essentially a text-editing mode (as opposed to edb, which is specifically a database-entry editor).
bibview
bibview is an X Window-based program for editing bibliography databases. Figure 12.1 shows an example of bibview editing a database entry.
FIXME:
Unlike the xbibtex program, bibview can handle optional and ignored fields in bibliography entries. It does not handle new entry types.
xbibtex
xbibtex is an X Window-based program for creating bibliography databases. Figure 12.2 shows an example of xbibtex creating a bibliography entry.
FIXME:
It does not appear that xbibtex can edit existing entries. It also does not handle unexpected fields or new entry types.
bibdb
The bibdb program is an MS-DOS-based bibliography database editor. The screen capture in Figure 12.3 shows an example of bibdb editing a bibliography entry. bibdb displays only fields that have values.
FIXME:
Although it does not handle arbitrary fields, bibdb has a large selection of optional fields. It does not handle new entry types.
Tib
Tib is another tool for maintaining bibliography databases. The format of a Tib database is the same as the format for the troff refer processor.[118] An example entry for The TeXbook is shown in Example 12.2.
Example 12.2. A Tib style database entry
Unlike BibTeX, which relies on the citation macros to format citations correctly, Tib actually replaces the citations with the appropriate information. In other words, Tib-style citations are not control sequences; they are just text strings. The Tib processor creates an entirely new document file that should be passed to TeX.
The general format of a Tib citation is [.citation key(s).]. Tib databases do not contain a key field; instead, the citation keys can come from any fields in the database entry (you can exclude some fields if you wish). The punctuation around citations in square brackets is adjusted by Tib. An alternate form of citation using angle brackets inserts a citation without adjusting any of the surrounding punctuation.
The bibliography is inserted into your document wherever the string “.[]” occurs at the beginning of a line. Analogous to BibTeX, the format of the reference list is controlled by a Tib style. Styles for roughly fifteen technical journals are provided.
Making Indexes
Constructing an index for a TeX document is relatively straightforward. First, you must identify each occurrence of each word or concept that you want indexed. To do this, you must insert \index entries into your document.[119] When you format your document containing the index entries, an IDX file is created that contains all of the entries along with the page number where each entry occurred.
The MakeIndex program[120] reads the IDX file, sorts and collates the entries, and writes an IND file. The MakeIndex program can also load an index style file, discussed in the next section.
When LaTeX processes your document, it inserts the contents of the IND file into your document at the point where you use the \printindex control sequence.
Index Entries
Index entries use special characters to identify different types of entries (simple entries, multiple-level entries, see-also's, etc.). The exact characters used are controlled by the index style file, and it may be convenient to select an alternate style. For example, MakeIndex uses the double-quote character by default to identify literal characters in the index entry, but the German Babel style makes the literal double quote character a shortcut for the umlaut accent; this use makes it unavailable as the quotation character in index entries.
Another point to consider when coding index entries is that the arguments to the index macros are expanded by TeX. This makes it difficult to insert some control sequences in an index entry (for example ). In LaTeX, you can combat this problem by using the control sequence:.
A complete list of the different kinds of index entries that can be created is included in {MakeIndex: An Index Processor for LaTeX} [ll:makeindex].
Index Format
The format of the index is controlled by both the index style (which specifies whether or not headings should be present, what the delimiters should be, etc.) and the definition of several control sequences:[121]
\bs\texttt{theindex}
Controls what happens just before the index is printed. This sequence should establish a new page, set up running headers and footers, select multiple columns, and do whatever other global setup is desired.
\bs\texttt{item}
Executed just before a first-level entry.
\bs\texttt{subitem}
Executed just before a second-level entry.
\bs\texttt{subsubitem}
Executed just before a third-level entry.
\bs\texttt{indexspace}
Executed between alphabetical sections.
\bs\texttt{endtheindex}
Controls what happens just after the index is printed. This sequence should undo anything that was started by \theindex.
Special-purpose styles
A number of styles have been written to allow you to develop special-purpose indexes. multind and index are described here.
multind supports multiple indexes in a single document.
index is a reimplementation of the LaTeX indexing commands. It stores index entries in the AUX files so that portions of a document can be reformatted without losing the entire index. It also provides support for multiple indexes and replaces the makeidx style option.
Making Glossaries
The LaTeX command \glossary can be used to accumulate words and concepts for a document glossary. The output, stored in a GLO file, is very similar to an IDX file. Like an index, the glossary file can be processed by MakeIndex to produce a sorted list of terms. To produce a glossary, you must create an index style for the glossary. Here is a minimal glossary.ist:
keyword "\\glossaryentry" preamble "\n\\begin{theglossary}\n" postamble "\n\\end{theglossary}\n"
The output file that MakeIndex creates will have to be edited by hand, naturally, in order to incorporate the definitions of the entries. You will also have to define an appropriate glossary environment by defining the control sequences used to make the glossary:
\bs\texttt{theglossary}
Controls what happens just before the glossary is printed. This sequence should establish a new page, set up running headers and footers, and do whatever other global setup is desired.
\bs\texttt{item}
Executed just before a glossary item.
\bs\texttt{endtheglossary}
Controls what happens just after the glossary is printed. This sequence should undo anything that was started by \theglossary.
Finally, you will have to \input or \include the glossary in your document.
[115] {For Plain TeX and other formats derived from Plain, the btxmac.tex macros provide these commands.}
[116] {This example uses Plain TeX syntax rather than a LaTeX \bs newcommand, because BibTeX databases can be used from Plain TeX as well as LaTeX.}
[117] {A complete description of BibTeX's programming language can be found in Designing BibTeX Styles [op:btxhak].}
[118] {refer databases can reportedly be converted to and from BibTeX format.}
[119] {The format you use must also provide support for indexing. In LaTeX, most of the support is built in, but you must use the makeidx style option. In Plain TeX and other Plain-derived formats, input the idxmac.tex macros at the top of your document. The rest of this section assumes that you use LaTeX, although the principles hold for any macro package that supports indexing.}
[120] {Frequently called makeidx on MS-DOS systems.}
[121] {In LaTeX, these control sequences have default definitions that you may not need to change.}
Get Making TeX Work now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.