Chapter 4. Text Basics

Any successful presentation, even a thoughtful tome, should have its text organized into an attractive, effective document. Organizing text into attractive and effective documents is HTML and XHTML's forte. The languages give you a number of tools that help you mold your text and get your message across. They also help structure your document so that your target audience has easy access to your words.

Always keep in mind while designing your documents (here we go again!) that the markup tags, particularly with regard to text, only advise—they do not dictate—how a browser will ultimately render the document. Rendering varies from browser to browser. Don't get too entangled with trying to get just the right look and layout. Your attempts may and probably will be thwarted by the browser.

Divisions and Paragraphs

Like most text processors, a browser wraps the words it finds to fit the horizontal width of its viewing window. Widen the browser's window, and words automatically flow upward to fill the wider lines. Squeeze the window, and words wrap downward.

Unlike most text processors, however, HTML and XHTML use explicit division (<div>), paragraph (<p>), and line-break (<br>) tags to control the alignment and flow of text. Return characters, although quite useful for readability of the source document, typically are ignored by the browser—authors must use the <br> tag to explicitly force a common text line break. The <p> tag, while also causing a line break, carries with it meaning and effects beyond a simple return.

The <div> tag is a little different. When originally codified in the HTML 3.2 standard, <div> was meant to be a simple organizational tool—to divide the document into discrete sections. That somewhat obtuse meaning meant few authors used it. But recent innovations (alignment, styles, and the id attribute for document referencing and automation) now let you more distinctly label and thereby define individual sections of your documents, as well as control the alignment and appearance of those sections. These features breathe real life and meaning into the <div> tag.

By associating an id and a class name with the various sections of your document, each delimited by a <div id=name class=name> tag and attributes (you can do the same with other tags, like <p>, too), you not only label those divisions for later reference by a hyperlink and for automated processing and management (collecting all the bibliography divisions, for instance), but you may also define different, distinct display styles for those portions of your document. For instance, you might define one divisional class for your document's abstract (<div class=abstract>, for example), another for the body, a third for the conclusion, and a fourth divisional class for the bibliography (<div class=biblio>, for example).

Each class, then, might be given a different display definition in a document-level or externally related stylesheet: for example, the abstract indented and in an italic typeface (such as div.abstract {left-margin: +0.5in; font-style: italic}); the body in a left-justified roman typeface; the conclusion similar to the abstract; and the bibliography automatically numbered and formatted appropriately.

We provide a detailed description of stylesheets, classes, and their applications in Chapter 8.

The <div> Tag

As defined in the HTML 4.01 and XHTML 1.0 and 1.1 standards, a <div> tag divides your document into separate, distinct sections. It may be used strictly as an organizational tool, without any sort of formatting associated with it, but it becomes more effective if you add the id and class attributes to label the divisions. The <div> tag also may be combined with the align attribute to control the alignment of whole sections of your document's content in the display and with the many programmatic "on event" attributes for user interaction.

The align attribute

The align attribute for <div> positions the enclosed content to the left (default), center, or right of the display. In addition, you can specify justify to align both the left and the right margins of the text. The <div> tag may be nested, and the alignment of the nested <div> tag takes precedence over the containing <div> tag. Further, other nested alignment tags, such as <center>, aligned paragraphs (see <p> in section 4.1.2), or specially aligned table rows and cells override the effects of <div>. Like the align attribute for other tags, it is deprecated in the HTML and XHTML standards in deference to stylesheet-based layout controls.

The nowrap attribute

Supported by Internet Explorer and Opera, but not Firefox or Netscape Navigator, the nowrap attribute suppresses automatic word wrapping of the text within the division. Line breaks will occur only where you have placed carriage returns in your source document.

While the nowrap attribute probably doesn't make much sense for large sections of text that would otherwise be flowed together on the page, it can make things a bit easier when creating blocks of text with many explicit line breaks: poetry, for example, or addresses. You don't have to insert all those explicit <br> tags in a text flow within a <div nowrap> tag. On the other hand, a large number of users with browsers that ignore the nowrap attribute will see your text flow merrily along. If you are targeting only Internet Explorer or Opera with your documents, consider using nowrap where needed, but otherwise, we can't recommend this attribute for general use.

The dir and lang attributes

The dir attribute lets you advise the browser in which direction the text should be displayed, and the lang attribute lets you specify the language used within the division. [The dir attribute, 3.6.1.1] [The lang attribute, 3.6.1.2]

The id attribute

Use the id attribute to label the document division for later reference by a hyperlink, stylesheet, applet, or other automated process. In general, an acceptable id value is any quote-enclosed string that uniquely identifies the division and that later can be used to reference that document section unambiguously. Specifically, the value must begin with a letter, and can contain letters, numbers, hyphens, colons, underscores, and periods, but not spaces. Although we're introducing it within the context of the <div> tag, this attribute can be used with almost any tag.

When used as an element label, the value of the id attribute can be added to a URL to address the labeled element uniquely within the document. You can label both large portions of content (via a tag like <div>) and small snippets of text (using a tag like <i> or <span>). For example, you might label the abstract of a technical report using <div id="abstract">. A URL could jump right to that abstract by referencing report.html#abstract. When used in this manner, the value of the id attribute must be unique with respect to all other id attributes within the document and all the names defined by any <a> tags with the name attribute. [Linking Within a Document, 6.3.3]

When used as a stylesheet selector, the value of the id attribute is the name of a style rule that can be associated with the current tag. This provides a second set of definable style rules, similar to the various style classes you may create. A tag can use both the class and the id attributes to apply two different rules to a single tag. In this case, the name associated with the id attribute must be unique with respect to all other style IDs within the current document. You can find a more complete description of style classes and IDs in Chapter 8.

The title attribute

Use the optional title attribute and quote-enclosed string value to associate a descriptive phrase with the division. Like the id attribute, the title attribute can be used with almost any tag and behaves similarly for all tags.

There is no standards-defined usage for the value of the title attribute, but current browsers display the title when the mouse pauses over that element—in this case, anywhere in the <div>-defined text area. For example, use the title attribute to provide helpful tips within your document.

The class and style attributes

Use the style attribute with the <div> tag to create an inline style for the content enclosed by the tag. The class attribute lets you apply the style of a predefined class of the <div> tag to the contents of this division. The value of the class attribute is the name of a style defined in some document-level or externally defined stylesheet. In addition, class-identified divisions lend themselves well to computer processing of your documents; for example, extracting all divisions with the class name "biblio," for the automated assembly of a master bibliography. [Inline Styles: The style Attribute, 8.1.1] [Style Classes, 8.3]

Event attributes

Many user-related events may happen in and around a division, such as when a user clicks or double-clicks the mouse within its display space. The browser recognizes these events if it conforms to the current HTML or XHTML standard (all the popular ones do). With the respective on attribute and value, you may react to those events by displaying a user dialog box or activating some multimedia event. [JavaScript Event Handlers, 12.3.3]

The <p> Tag

The <p> tag signals the start of a paragraph. That's not well known even by some veteran webmasters, because it runs counterintuitive to what we've come to expect from experience. Most word processors we're familiar with use just one special character, typically the return character, to signal the end of a paragraph, not the beginning. By contrast, in HTML and XHTML, each paragraph should start with the paragraph tag <p> and end with the corresponding </p> end tag. Moreover, while a series of newline or return characters in a text processor-displayed document, created when the author hits the Enter key repeatedly, creates an empty paragraph for each one, browsers typically ignore all but the first paragraph tag, as well as newline characters.

In practice, with HTML you can ignore the starting <p> tag at the beginning of the first paragraph and the </p> tags at the end of each paragraph: they can be implied from other tags that occur in the document and hence safely omitted.[*] For example:

<body>
This is the first paragraph, at the very beginning of the body of
this document.
<p>
The tag above signals the start of this second paragraph. When rendered
by a browser, it will begin slightly below the end of the first paragraph,
with a bit of extra whitespace between the two paragraphs. 
 

<p>
This is the last paragraph in the example.
</body>

Notice that we haven't included the paragraph start tag (<p>) for the first paragraph or any end paragraph tags; they can be unambiguously inferred by the HTML browser and are therefore unnecessary.

In general, you'll find that human document authors tend to omit postulated tags whenever possible, and automatic document generators tend to insert them. That may be because the software designers didn't want to run the risk of having their products chided by competitors as not adhering to the HTML standard, even though we're splitting letter-of-the-law hairs here. Go ahead and be defiant: omit that first paragraph's <p> tag and don't give a second thought to paragraph-ending </p> tags—provided, of course, that your document's structure and clarity are not compromised (that is, as long as you are aware that XHTML frowns severely on such laxity, too).

Paragraph rendering

When encountering a new paragraph (<p>) tag, the browser typically inserts one blank line plus some extra vertical space into the display before starting the new paragraph. The browser then collects all the words and, if present, inline images into the new paragraph, ignoring leading and trailing spaces (not spaces between words, of course) and return characters in the source text. The browser software then flows the resulting sequence of words and images into a paragraph that fits within the margins of its display window, automatically generating line breaks as needed to wrap the text within the window. For example, compare how a browser arranges the text into lines and paragraphs (Figure 4-1) to how the preceding example is printed on the page. The browser may also automatically hyphenate long words, and the paragraph may be full-justified to stretch the line of words out toward both margins.

Browsers ignore common return characters in the source HTML/XHTML document

Figure 4-1. Browsers ignore common return characters in the source HTML/XHTML document

The net result is that you do not have to worry about line length, word wrap, and line breaks when composing your documents. The browser will take any arbitrary sequence of words and images and display a nicely formatted paragraph.

If you want to control line length and breaks explicitly, consider using a preformatted text block with the <pre> tag. If you need to force a line break, use the <br> tag.[<pre>, 4.6.5] [<br>, 4.6.1]

The align attribute

Most browsers automatically left-justify a new paragraph. To change this behavior, HTML 4 and XHTML give you the align attribute for the <p> tag and provide four kinds of content justification: left, right, center, and justify.

Figure 4-2 shows the effect of various alignments as rendered from the following source:

<p align=right>
Right over here!
<br>
This is too.
<p align=left>
Slide back left.
<p align=center>
Smack in the middle.
</p>
Left is the default.
Effect of the align attribute on paragraph justification

Figure 4-2. Effect of the align attribute on paragraph justification

Notice in the HTML example that the paragraph alignment remains in effect until the browser encounters another <p> tag or an ending </p> tag. We deliberately left out a final <p> tag in the example to illustrate the effects of the </p> end tag on paragraph justification. Other body elements—including forms, headers, tables, and most other body content-related tags—may also disrupt the current paragraph alignment and cause subsequent paragraphs to revert to the default left alignment.

Note that the align attribute is deprecated in HTML 4 and XHTML, in deference to stylesheet-based alignments.

The dir and lang attributes

The dir attribute lets you advise the browser in which direction the text within the paragraph should be displayed, and the lang attribute lets you specify the language used within that paragraph. The dir and lang attributes are supported by the popular browsers, even though there are no behaviors defined for any specific language.[The dir attribute, 3.6.1.1] [The lang attribute, 3.6.1.2]

The class, id, style, and title attributes

Use the id attribute to create a label for the paragraph that can later be used to unambiguously reference that paragraph in a hyperlink target, for automated searches, as a stylesheet selector, and with a host of other applications. [The id attribute, 4.1.1.4]

Use the optional title attribute and quote-enclosed string value to provide a descriptive phrase for the paragraph. [The title attribute, 4.1.1.5]

Use the style attribute with the <p> tag to create an inline style for the paragraph's contents. The class attribute lets you label the paragraph with a name that refers to a predefined class of the <p> tag previously declared in some document-level or externally defined stylesheet. Class-identified paragraphs lend themselves well to computer processing of your documents—for example, extracting all paragraphs whose class name is "citation," for automated assembly of a master list of citations. [Inline Styles: The style Attribute, 8.1.1] [Style Classes, 8.3]

Event attributes

As with divisions, a browser recognizes many user-initiated events, such as when a user clicks or double-clicks within a tag's display space, if the browser conforms to the current HTML or XHTML standard. With the respective on attribute and value, you may react to those events by displaying a user dialog box or activating some multimedia event. [JavaScript Event Handlers, 12.3.3]

Allowed paragraph content

A paragraph may contain any element allowed in a text flow, including conventional words and punctuation, links (<a>), images (<img>), line breaks (<br>), font changes (<b>, <i>, <tt>, <u>, <strike>, <big>, <small>, <sup>, <sub>, and <font>), and content-based style changes (<acronym>, <cite>, <code>, <dfn>, <em>, <kbd>, <samp>, <strong>, and <var>). If any other element occurs within the paragraph, it implies that the paragraph has ended, and the browser assumes that the closing </p> tag was not specified.

Allowed paragraph usage

You may specify a paragraph only within a block, along with other paragraphs, lists, forms, and preformatted text. In general, this means that paragraphs can appear where a flow of text is appropriate, such as in the body of a document, in an element in a list, and so on. Technically, paragraphs cannot appear within a header, anchor, or other element whose content is strictly text-only. In practice, most browsers ignore this restriction and format the paragraph as a part of the containing element.



[*] XHTML, on the other hand, requires explicit starting and ending tags.

Get HTML & XHTML: The Definitive Guide, 6th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.