Chapter 1. Strings
Introduction
A string is one of the fundamental building blocks of data that JavaScript works with. Any script that touches URLs or user entries in form text boxes works with strings. Most document object model properties are string values. Data that you read or write to a browser cookie is a string. Strings are everywhere!
The core JavaScript language has a repertoire of the common string manipulation properties and methods that you find in most programming languages. You can tear apart a string character by character if you like, change the case of all letters in the string, or work with subsections of a string. Most scriptable browsers now in circulation also benefit from the power of regular expressions, which greatly simplify numerous string manipulation tasks—once you surmount a fairly steep learning curve.
Your scripts will commonly be handed values that are already string
data types. For instance, if you need to inspect the text that a user
has entered into a form’s text box, the
value
property of that text box object returns a
value already typed as a string. All properties and methods of any
string object are immediately available for your scripts to operate
on that text box value.
Creating a String
If you need to create a string, you have a couple of ways to accomplish it. The simplest way is to simply assign a quoted string of characters to a variable (or object property):
var myString = "Fluffy is a pretty cat.";
Quotes around a JavaScript string can be either single or double quotes, but each pair must be of the same type. Therefore, both of the following statements are acceptable:
var myString = "Fluffy is a pretty cat."; var myString = 'Fluffy is a pretty cat.';
But the following mismatched pair is illegal and throws a script error:
var myString = "Fluffy is a pretty cat.';
Having the two sets of quote symbols is handy when you need to embed
one string within another. The following document.write(
)
statement that would execute while a page loads into the
browser has one outer string (the entire string being written by the
method) and nested sets of quotes that surround a string value for an
HTML element attribute:
document.write("<img src='img/logo.jpg' height='30' width='100' alt='Logo'>");
You are also free to reverse the order of double and single quotes as your style demands. Thus, the above statement would be interpreted the same way if it were written as follows:
document.write('<img src="img/logo.jpg" height="30" width="100" alt="Logo">');
Two more levels of nesting are also possible if you use escape characters with the quote symbols. See Recipe 1.8 for examples of escaped character usage in JavaScript strings.
Technically speaking, the strings described so far
aren’t precisely
string objects in
the purest sense of JavaScript. They are string
values, which, as it turns out, lets the strings
use all of the properties and methods of the global
String
object that inhabits every scriptable
browser window. Use string values for all of your JavaScript text
manipulation. In a few rare instances, however, a JavaScript string
value isn’t quite good enough. You may encounter
this situation if you are using JavaScript to communicate with a Java
applet, and one of the applet’s public methods
requires an argument as a string data type. In this case, you might
need to create a full-fledged instance of a String
object and pass that object as the method argument. To create such an
object, use the constructor function of the String
object:
var myString = new String("Fluffy is a pretty cat.");
The data type of the myString
variable after this
statement executes is object
rather than
string
. But this object inherits all of the same
String
object properties and methods that a string
value has, and works fine with a Java applet.
Regular Expressions
For the uninitiated, regular expressions can be cryptic and confusing. This isn’t the forum to teach you regular expressions from scratch, but perhaps the recipes in this chapter that demonstrate them will pique your interest enough to pursue their study.
The purpose of a regular expression is to define a pattern of characters that you can then use to compare against an existing string. If the string contains characters that match the pattern, the regular expression tells you where the match is within the string, facilitating further manipulation (perhaps a search-and-replace operation). Regular expression patterns are powerful entities because they let you go much further than simply defining a pattern of fixed characters. For example, you can define a pattern to be a sequence of five numerals bounded on each side by whitespace. Another pattern can define the format for a typical email address, regardless of the length of the username or domain, but the full domain must include at least one period.
The cryptic part of regular expressions is the notation they use to specify the various conditions within the pattern. JavaScript regular expressions notation is nearly identical to regular expressions found in languages such as Perl. The syntax is the same for all except for some of the more esoteric uses. One definite difference is the way you create a regular expression object from a pattern. You can use either the formal constructor function or shortcut syntax. The following two syntax examples create the same regular expression object:
var re = /pattern
/ [g | i | gi]; // Shortcut syntax var re = new RegExp(["pattern
", ["g "| "i" | "gi"]]); // Formal constructor
The optional trailing characters
(g
, i
, and
gi
) indicate whether the pattern
should be applied globally and whether the pattern is
case-insensitive. Internet Explorer 5.5 or later for Windows and
Netscape 6 or later also recognize the optional m
modifier, which influences string boundary pattern matching within
multiline strings.
If you have been exposed to regular expressions in the past, Table 1-1 lists the regular expression pattern notation available in browsers since NN 4 and IE 4.
See Recipe 1.5 through Recipe 1.7, as well as Recipe 8.2, to see how regular expressions can empower a variety of string examination operations with less overhead than more traditional string manipulations. For in-depth coverage of regular expressions, see Mastering Regular Expressions, by Jeffrey E. F. Friedl (O’Reilly).
Concatenating (Joining) Strings
NN 2, IE 3
Problem
You want to join together two strings or accumulate one long string from numerous sequential pieces.
Solution
Within a single statement, use the plus
(+
) operator to
concatenate multiple string values:
var longString = "One piece " + "plus one more piece.";
To accumulate a string value across multiple statements, use the
add-by-value
(+=
) operator:
var result = ""; result += "My name is " + document.myForm.myName.value; result += " and my age is " + document.myForm.myAge.value;
The add-by-value operator is fully backward-compatible and is more compact than the less elegant approach:
result = result + "My name is " + document.myForm.myName.value;
Discussion
You can use multiple concatenation operators within a single statement as needed to assemble your larger string, but you must be cautious about word wrapping of your source code. Because JavaScript interpreters have a built-in feature that automatically inserts semicolons at the logical ends of source code lines, you cannot simply break a string with a carriage return character in the source code without putting the syntactically correct breaks in the code to indicate the continuation of a string value. For example, the following statement and format triggers a syntax error as the page loads:
var longString = "One piece " + "plus one more piece.";
The interpreter treats the first line as if it were:
var longString = "One piece " + "plus one;
To the interpreter, this statement contains an unterminated string and invalidates both this statement and anything coming after it. To break the line correctly, you must terminate the trailing string, and place a plus operator as the final character of the physical source code line (do not put a semicolon there because the statement isn’t finished yet). Also, be sure to start the next line with a quote symbol:
var longString = "One piece " + "plus one " + "more piece.";
Additionally, whitespace outside of the quoted string is ignored. Thus, if you wish to format the source code for improved readability, you can even indent the second line without affecting the content of the string value:
var longString = "One piece " + "plus one " + "more piece.";
Source code carriage returns do not influence string text. If you
want to include a carriage return in a string, you need to include
one of the special escaped characters (e.g., \n
)
in the string. For example, to format a string for a confirm dialog
box so that it creates the illusion of two paragraphs, include a pair
of the special newline characters in the string:
var confirmString = "You did not enter a response to the last " + "question.\n\nSubmit form anyway?";
Note that this kind of newline character is for string text that
appears in dialog boxes or other string-only containers. It is not a
newline character for text that is to be rendered as HTML content.
For that kind of newline, you must explicitly include a
<br>
tag in the string:
var htmlString = "First line of string.<br>Second line of string.";
See Also
Recipe 1.8 to see how to include special control characters (such as a carriage return) in a string value.
Accessing Substrings
NN 2, IE 3
Solution
Use the substring( )
method (in all scriptable browsers) to
copy a segment starting at a particular location and ending either at
the end of the string (omitting the second parameter does that) or at
a fixed position within the string, counting from the start of the
string:
var myString = "Every good boy does fine."; var section = myString.substring(0, 10); // section is now "Every good"
Use the slice( )
method (in NN 4 or later and IE 4 or
later) to set the end position at a point measured from the end of
the string, using a negative value as the second parameter:
var myString = "Every good boy does fine."; var section = myString.slice(11, -6); // section is now "boy does"
Use the nonstandard, but widely supported, variant called
substr( )
to copy a segment starting at a
particular location for a string length (the second parameter is an
integer representing the length of the substring):
var myString = "Every good boy does fine."; var section = myString.substr(6, 4); // section is now "good"
If the sum of the two arguments exceeds the length of the string, the method returns a string from the start point to the end of the string.
Discussion
Parameters for the ECMA-compatible slice( )
and
substring( )
methods are numbers that indicate the
zero-based start and end positions within the string from which the
extract comes. The first parameter, indicating the start position, is
required. When you use two positive integer values for the
slice( )
method arguments (and the first argument
is smaller than the second), you receive the same string value as the
substring( )
method with the same arguments.
Note that the integer values for substring( )
and
slice( )
act as though they point to spaces
between characters. Therefore, when a substring( )
method’s arguments are set to 0
and 4
, it means that the substring starts to the
right of the “zeroeth” position and
ends to the left of the fourth position; the length of the string
value returned is four characters, as shown in Figure 1-1.
If you should supply argument values for the substring(
)
or substr( )
methods in an order that
causes the first argument to be larger than the second, the
JavaScript interpreter automatically reverses the order of arguments
so that the end pointer value is always larger than the start
pointer. The slice( )
method
isn’t as forgiving and returns an empty string.
None of the substring methods modifies the original string object or value in any way. This is why you must capture the returned value in a variable, or apply the returned value as an argument to some other function or method.
See Also
Recipe 1.5 for testing whether a string contains a substring.
Changing String Case
NN 2, IE 3
Solution
Use the two dedicated String
object methods,
toLowerCase( )
and toUpperCase(
)
, for case changes:
var myString = "New York"; var lcString = myString.toLowerCase( ); var ucString = myString.toUpperCase( );
Both methods return modified copies of the original string, leaving it intact. If you want to replace the value of a variable with a case-converted version of the original string (and thus eliminate the original string), reassign the results of the method to the same variable:
myString = myString.toLowerCase( );
Do not, however, redeclare the variable with a var
keyword.
Discussion
Because JavaScript strings (like just about everything else in the language) are case-sensitive, it is common to use case conversion for tasks such as testing the equivalency of a string entered into a text box by a user against a known string in your code. Because the user might include a variety of case variations in the entry, you need to guard against unorthodox entries by converting the input text to all uppercase or all lowercase letters for comparison (see Recipe 1.4).
Another common need for case conversion is preparing user entries for
submission to a database that prefers or requires all uppercase (or
all lowercase) letters. You can accomplish this for a user either at
time of entry or during batch validation prior to submission. For
example, an onchange
event handler in a text box
can convert the text to all uppercase letters as follows:
<input type="text" name="firstName" id="firstName" size="20" maxlength="25" onchange="this.value = this.value.toUpperCase( )" />
Simply reassign a converted version of the element’s value to itself.
See Also
Recipe 1.4 for a practical example of case conversion simplifying an important string task.
Testing Equality of Two Strings
NN 2, IE 3
Solution
Convert the user input to either all uppercase or all lowercase characters, and then use the JavaScript equality operator to make the comparison:
if (document.myForm.myTextBox.value.toLowerCase( ) = = "new york") { // process correct entry }
By using the results of the case conversion method as one of the operands of the equality expression, you do not modify the original contents of the text box. (See Recipe 1.3 if you want to convert the text in the text box to all of one case.)
Discussion
JavaScript has two types of equality operators. The fully
backward-compatible, standard equality operator (=
=
) employs data type conversion in some cases when the
operands on either side are not of the same data type. Consider the
following variable assignments:
var stringA = "My dog has fleas."; var stringB = new String("My dog has fleas.");
These two variables might contain the same series of characters but
are different data types. The first is a string value, while the
second is an instance of a String
object. If you
place these two values on either side of an equality (=
=
) operator, JavaScript tries various evaluations of the
values to see if there is a coincidence somewhere. In this case, the
two variable values would show to be equal, and the following
expression:
stringA = = stringB
returns true
.
But the other type of equality operator, the
strict equality operator
(= = =
), performs no data type conversions. Given
the variable definitions above, the following expression evaluates to
false
because the two object types differ, even
though their payloads are the same:
stringA = = = stringB
If the logic of your code requires you
to test for the inequality of two strings, you can use the inequality
(!=
) and strict inequality (!=
=
) operators. For example, if you want to process an
incorrect entry, the branching flow of your function would be like
the following:
if (document.myForm.myTextBox.value.toLowerCase( ) != "new york") { // process incorrect entry }
The same data type conversion issues apply to the inequality and strict inequality operators as to their opposite partners.
Although the equality and inequality operators go to great lengths to find value matches, you may prefer to assist the process by performing obvious data type conversions in advance of the operators. For instance, if you want to see if an entry to a numeric text box (a string value) is a particular number, you could let the equality operator perform the conversion for you, as in:
if (document.myForm.myTextBox.value = = someNumericVar) { ... }
Or you could act in advance by converting one of the operands so that both are the same data type:
if (parseInt(document.myForm.myTextBox.value) = = someNumericVar) { ... }
If you are accustomed to more strongly typed programming languages, you can continue the practice in JavaScript without penalty, while perhaps boosting your script’s readability.
See Also
Recipe 2.1 for converting between string and number values; Recipe 3.3 for converting between strings and arrays; Recipe 3.13 for converting a custom object to a string value.
Testing String Containment Without Regular Expressions
NN 2, IE 3
Solution
Use the JavaScript indexOf(
)
string method on the longer string
section, passing the shorter string as an argument. If the shorter
string is inside the larger string, the method returns a zero-based
index integer of the start position of the smaller string within the
larger string. If the shorter string is not in the larger string, the
method returns -1.
For logic that needs to branch if the smaller string is not contained by the larger string, use the following construction:
if (largeString.indexOf(shortString) = = -1) { // process due to missing shortString }
For logic that needs to branch if the smaller string is contained somewhere within the larger string, use the following construction:
if (largeString.indexOf(shortString) != -1) { // process due to found shortString }
In either case, you are not interested in the precise position of the short string but simply whether it is anywhere within the large string.
Discussion
You may also find the integer returned by the indexOf(
)
method to be useful in a variety of situations. For
example, an event handler function that gets invoked by all kinds of
elements in the event-propagation (bubbling) chain wants to process
events that come only from elements whose IDs begin with a particular
sequence of characters. This is an excellent spot to look for the
returned value of zero, pointing to the start of the larger string:
function handleClick(evt) { var evt = (evt) ? evt : ((window.event) ? window.event : null); if (evt) { var elem = (evt.target) ? evt.target : ((evt.srcElement) ? evt.srcElement : null); if (elem && elem.id.indexOf("menuImg") = = 0) { // process events from elements whose IDs begin with "menuImg" } } }
Be aware that if the larger string contains multiple instances of the
shorter string, the indexOf( )
method returns a
pointer only to the first instance. If you’re
looking to count the number of instances, you can take advantage of
the indexOf( )
method’s optional
second parameter, which specifies the starting position for the
search. A compact repeat loop can count up the instances quickly:
function countInstances(mainStr, srchStr) { var count = 0; var offset = 0; do { offset = mainStr.indexOf(srchStr, offset); count += (offset != -1) ? 1 : 0; } while (offset++ != -1) return count }
Counting instances is much easier, however, using regular expressions (see Recipe 1.6).
See Also
Recipe 1.6 for using regular expressions to test string containment.
Testing String Containment with Regular Expressions
NN 4, IE 4
Solution
Create a regular expression with the short string (or pattern) and
the global (g
) modifier.
Then pass that regular expression as a parameter to the
match( )
method of a string value or object:
var re = /a string literal
/g;
var result = longString.match(re);
When a global modifier is attached to the regular expression pattern,
the match( )
method returns an array if one or
more matches are found in the longer string. If there are no matches,
the method returns null
.
Discussion
To work this regular expression mechanism into a practical function, you need some helpful surrounding code. If the string you are looking for is in the form of a string variable, you can’t use the literal syntax for creating a regular expression as just shown. Instead, use the constructor function:
var shortStr = "Framistan 2000"; var re = new RegExp(shortStr, "g"); var result = longString.match(re);
After you have called the match( )
method, you can
inspect the contents of the array value returned by the method:
if (result) { alert("Found " + result.length + " instances of the text: " + result[0]); } else { alert("Sorry, no matches."); }
When matches exist, the array returned by match( )
contains the found strings. When you use a fixed string as the
regular expression pattern, these returned values are redundant.
That’s why it’s safe in the
previous example to pull the first returned value from the array for
display in the alert dialog box. But if you use a regular expression
pattern involving the symbols of the regular expression language,
each of the returned strings could be quite different, but equally
valid because they adhere to the pattern.
As long as you specify the g
modifier for the
regular expression, you may get multiple matches (instead of just the
first). The length of the array indicates the number of matches found
in the longer string. For a simple containment test, you can omit the
g
modifier; as long as there is a match, the
returned value will be an array of length 1.
See Also
Section 1.0.2 in the introduction to this chapter; Recipe 8.2 for using regular expressions in form field validations.
Searching and Replacing Substrings
NN 4, IE 4
Solution
The most efficient way (for NN 4 or later and IE 4 or later) is to
use a regular expression with the replace(
)
method of the String
object:
var re = /a string literal
/g; var result = mainString.replace(re,replacementString
);
Invoking the replace( )
method on a string does
not change the source string. Capture the changed string returned by
the method, and apply the result where needed in your scripts or
page. If no replacements are made, the original string is returned by
the method. Be sure to specify the g
modifier for
the regular expression to force the replace( )
method to operate globally on the original string; otherwise, only
the first instance is replaced.
Discussion
To work this regular expression mechanism into a practical function, you need some helpful surrounding code. If the string you are looking for is in the form of a string variable, you can’t use the literal syntax for creating a regular expression as just shown. Instead, use the constructor function:
var searchStr = "F2"; var replaceStr = "Framistan 2000"; var re = new RegExp(searchStr , "g"); var result = longString.replace(re, replaceStr);
In working with a text-based form control or an
element’s text node, you can perform the
replace( )
operation on the value of the existing
text, and immediately assign the results back to the original
container. For example, if a div
element contains
one text node with scattered place holders in the form of
(ph)
, and the job of the replace(
)
method is to insert a user’s entry from
a text box (called myName
), the sequence is as
follows:
var searchStr = "\\(ph\\)"; var re = new RegExp(searchStr, "g"); var replaceStr = document.myForm.myName.value; var div = document.getElementById("boilerplate"); div.firstChild.nodeValue = div.firstChild.nodeValue.replace(re, replaceStr);
The double backslashes are needed to escape the escape character before the parentheses characters, which are otherwise meaningful symbols in the regular expression pattern language.
It is also possible to implement a search-and-replace feature without
regular expressions but it’s a cumbersome exercise.
The technique involves substantial text parsing using the
indexOf( )
method to find the starting location of
text to be replaced. You need to copy preceding text into a variable
and strip away that text from the original string; keep repeating
this find-strip-accumulate tactic until the entire string is
accounted for, and you have inserted the replacement string in place
of each found search string. It was necessary in the early browsers,
but regular expressions are implemented in almost all scriptable
browsers that are now in use.
See Also
Section 1.0.2 in the introduction to this chapter; Recipe 14.14 for additional body text replacement techniques in modern browsers.
Using Special and Escaped Characters
NN 2, IE 3
Solution
Use the escape sequences shown in Table 1-2 to
represent the desired character. For example, to include an
apostrophe inside a literal string, use \'
, as in:
var msg = "Welcome to Joe\'s Diner.";
Discussion
The core JavaScript language includes a feature common to most programming languages that lets you designate special characters. A special character is not one of the plain alphanumeric characters or punctuation symbols, but has a particular meaning with respect to whitespace in text. Common characters used these days include the tab, newline, and carriage return.
A special character begins with a backslash, followed by the
character representing the code, such as \t
for
tab and \n
for newline. The backslash is called an
escape character, instructing the interpreter to
treat the next character as a special character. Table 1-2 shows the recognized escape sequence
characters and their meanings. To include these characters in a
string, include the backslash and special character inside the quoted
string:
var confirmString = "You did not enter a response to the last " + "question.\n\nSubmit form anyway?";
If you want to use one of these symbols between variables that contain string values, be sure the special character is quoted in the concatenation statement:
var myStr = lineText1 + "\n" + lineText2;
Special characters can be used to influence formatting of text in
basic dialog boxes (from the alert( )
,
confirm( )
, and prompt( )
methods) and textarea
form controls.
Table 1-2 shows the recognized escaped characters and their meanings.
Note that to include a visible backslash character in a string, you must use a double backslash because a single one is treated as the invisible escape character. Use the escaped quote symbols to include single or double quotes inside a string.
While you can use an escaped character in tests for the existence of,
say, line feed characters in a string, you have to exercise some care
when doing so with the content of a
textarea
element. The
problem accrues from a variety of implementations of how user-entered
carriage returns are coded in the
textarea
’s content. IE for
Windows inserts two escaped characters
(\r\n
in
that sequence) whenever a user presses the Enter key to make a
newline in a textarea
. But IE for Macintosh uses
only the \r
character. And Netscape 6 and later
inserts \n
for newlines. Navigator 4 is governed
more by the operating system in which the browser runs:
\r\n
for Windows; \r
for
Macintosh; and \n
for Unix. This wide variety in
character combinations makes searches for user-typed line breaks
difficult to perform accurately across browsers and operating
systems.
Going the other way—creating a string for script insertion into
a textarea
value—is easier because modern
browsers accommodate all symbols. Therefore, if you assign just
\r
or \n
or the combination
\r\n
, all browsers interpret any one of them as a
carriage return, and convert the escape character(s) to match their
internal handling.
See Also
Recipe 1.1 for tips on concatenating strings—tips that apply equally to escaped string characters.
Reading and Writing Strings for Cookies
NN 2, IE 3
Solution
Use the
cookies.js library shown in the Discussion as a
utility for saving and retrieving cookies. To set a cookie via the
library, invoke the setCookie(
)
function, passing, at a minimum, the
cookie’s name and string value as arguments:
setCookie ("userID", document.entryForm.username.value);
To retrieve a cookie’s value, invoke the
library’s getCookie( )
function,
as in:
var user = getCookie("userID");
Discussion
Example 1-1 shows the code for the entire cookies.js library.
// utility function to retrieve an expiration date in proper // format; pass three integer parameters for the number of days, hours, // and minutes from now you want the cookie to expire (or negative // values for a past date); all three parameters are required, // so use zeros where appropriate function getExpDate(days, hours, minutes) { var expDate = new Date( ); if (typeof days = = "number" && typeof hours = = "number" && typeof minutes = = "number") { expDate.setDate(expDate.getDate( ) + parseInt(days)); expDate.setHours(expDate.getHours( ) + parseInt(hours)); expDate.setMinutes(expDate.getMinutes( ) + parseInt(minutes)); return expDate.toGMTString( ); } } // utility function called by getCookie( ) function getCookieVal(offset) { var endstr = document.cookie.indexOf (";", offset); if (endstr = = -1) { endstr = document.cookie.length; } return unescape(document.cookie.substring(offset, endstr)); } // primary function to retrieve cookie by name function getCookie(name) { var arg = name + "="; var alen = arg.length; var clen = document.cookie.length; var i = 0; while (i < clen) { var j = i + alen; if (document.cookie.substring(i, j) = = arg) { return getCookieVal(j); } i = document.cookie.indexOf(" ", i) + 1; if (i = = 0) break; } return ""; } // store cookie value with optional details as needed function setCookie(name, value, expires, path, domain, secure) { document.cookie = name + "=" + escape (value) + ((expires) ? "; expires=" + expires : "") + ((path) ? "; path=" + path : "") + ((domain) ? "; domain=" + domain : "") + ((secure) ? "; secure" : ""); } // remove the cookie by setting ancient expiration date function deleteCookie(name,path,domain) { if (getCookie(name)) { document.cookie = name + "=" + ((path) ? "; path=" + path : "") + ((domain) ? "; domain=" + domain : "") + "; expires=Thu, 01-Jan-70 00:00:01 GMT"; } }
The library begins with a utility function (getExpDate(
)
) that your scripts use to assist in
setting an expiration date for the cookie. A second utility function
(getCookieVal( )
) is invoked internally during the
reading of a cookie.
Use the getCookie( )
function in your scripts to read the
value of a named cookie previously saved. The name you pass to the
function is a string. If no cookie by that name exists in the
browser’s cookie filing system, the function returns
an empty string.
To save a cookie, invoke the setCookie( )
function. Required parameters are the first one for the name of the
cookie and the second, which contains the value to be preserved. If
you intend the cookie to last beyond the user quitting the browser,
be sure to set an expiration date as the third parameter. Filter the
expiration time period through the getExpDate( )
function shown earlier so that the third parameter of
setCookie( )
is in the correct format.
One last function, deleteCookie( )
, lets you
delete an existing cookie before its expiration date. The function is
hardwired to set the expiration date to the start of the JavaScript
date epoch.
Load the library into your page in the head portion of the document:
<script type="text/javascript" src="cookies.js"></script>
All cookie values you save must be string values; all cookie values you retrieve are string values.
A browser cookie is the only way to preserve a string value on the
client between visits to your web site. Scripts on your page may read
only cookies that were saved from your domain and server. If you have
multiple servers in your domain, you can set the fifth parameter of
setCookie( )
to share cookies between servers at
the same domain.
Browsers typically limit capacity to 20 name/value pairs of cookies per server; a cookie should be no more than 4,000 characters, but more practically, the value of an individual named cookie should be less than 2,000 characters. In other words, cookies are not meant to act as high-volume data storage facilities on the client. Also, browsers automatically send domain-specific cookie data to the server as part of each page request. Keep the amount of data small to limit the impact on dial-up users.
When you save a cookie, the name/value pair resides in the browser’s memory. The data, if set to expire some time in the future, is written to the cookie filesystem only when the browser quits. Therefore, don’t be alarmed if you don’t see your latest entry in the cookie file while the browser is still running. Different browsers save their cookies differently (and in different places in each operating system). IE stores each domain’s cookies in its own text file, while Netscape gangs all cookies together in a single text file.
All of this cookie action is made possible through the
document.cookie
property. The purpose of the
cookies.js library is to act as a friendlier
interface between your scripts and the
document.cookie
property, which
isn’t as helpful as it could be in extracting cookie
information. Although you can save a cookie with several parameters,
only the value of a cookie is available for reading—not the
expiration date, path, or domain details.
Cookies are commonly used to preserve user preference settings
between visits. A script near the top of the page reads the cookie to
see if it exists, and, if so, applies settings to various content or
layout attributes while the rest of the page loads. Recipe 12.4 shows
how this can work to let users select a relative font size and
preserve the settings between visits. For example, the function that
preserves the user’s font size choice saves the
value to a cookie named fontSize
, which is set to
expire in 180 days if not updated before then:
setCookie("fontSize", styleID, getExpDate(180, 0, 0));
The next time the user visits, the cookie is read while the page loads:
var styleCookie = getCookie("fontSize");
With the information from the cookie, the script applies the previously selected style sheet to the page. If the cookie was not previously set, the script assigns a default style sheet to use in the interim.
Just because cookies can store only strings, don’t let that get in the way of preserving information normally stored in arrays or custom objects. See Recipe 3.12 and Recipe 8.14 for ways to convert more complex data types to strings for preservation, and then restore their original form after retrieval from the cookie on the next visit.
See Also
Recipe 10.4 for passing data between pages via cookies; Recipe 12.4 for an example of using cookies to preserve a user’s style preference; Recipe 3.12 and Recipe 8.14 for ways of converting arrays and objects to cookie string values.
Converting Between Unicode Values and String Characters
NN 2, IE 3
Solution
To obtain the Unicode value of a character of a string, use the
charCodeAt( )
method of the string value. A single
parameter is an integer pointing to the zero-based position of the
character within the string:
var code = myString.charCodeAt(3);
If the string consists of only one character, use the
0
argument to get the code for that one character:
var oneChar = myString.substring(12, 13); var code = oneChar.charCodeAt(0);
The returned value is an integer.
To convert an Unicode code number to a character, use the
fromCharCode( )
method of the static
String
object:
var char = String.fromCharCode(66);
Unlike most string methods, this one must be invoked only from the
String
object and not from a string value.
Discussion
ASCII values and Unicode values are the same for the basic Latin alphanumeric (low-ASCII) values. But even though Unicode encompasses characters from many written languages around the world, do not expect to see characters from other writing systems displayed in alert boxes, text boxes, or rendered pages simply because you know the Unicode values for those characters; the browser and operating system must be equipped for the language encompassed by the characters. If the character sets are not available, the characters generated by such codes will be question marks or other symbols. A typical North American computer won’t know how to produce a Chinese character on the screen unless the target writing system and font sets are installed for the OS and browser.
See Also
Recipe 1.2 for other ways to extract single-character substrings.
Encoding and Decoding URL Strings
NN 6, IE 5.5(Win)
Problem
You want to convert a string of plain text to a format suitable for use as a URL or URL search string, or vice versa.
Solution
To convert a string consisting of an entire URL to a URL-encoded
form, use the encodeURI(
)
method, passing the string needing
conversion as an argument. For example:
document.myForm.action = encodeURI(myString);
If you are assembling content for values of search string name/value
pairs, apply the encodeURIComponent(
)
method:
var srchString = "?name=" + encodeURIComponent(myString);
Both methods have complementary partners that perform conversions in the opposite direction:
decodeURI(encodedURIString
) decodeURIComponent(encodedURIComponentString
)
In all cases, the original string is not altered when passed as an argument to these methods. Capture the results from the value returned by the methods.
Discussion
Although the escape( )
and unescape(
)
methods have been available since the
first scriptable browsers, they have been deprecated in the formal
language specification (ECMA-262) in favor of a set of new methods.
The new methods are available in IE 5.5 or later for Windows and
Netscape 6 or later.
These new encoding methods work by slightly different rules than the
old escape( )
and unescape( )
methods. As a result, you must encode and decode using the same pairs
of methods at all times. In other words, if a URL is encoded with
encodeURI( )
, the resulting string can be decoded
only with decodeURI( )
.
The differences between encodeURI( )
and
encodeURIComponent( )
are defined by the range of
characters that the methods convert to the URI-friendly form of a
percent sign (%
) followed by the hexadecimal
Unicode value of the symbol (e.g., a space becomes
%20
). Regular alphanumeric characters are not
converted, but when it comes to punctuation and special characters,
the two methods diverge in their coverage. The encodeURI(
)
method converts the following symbols from the characters
in the ASCII range of 32 through 126:
space
" % < > [ \ ] ^ ` { | }
For example, if you are assembling a URL with a simple search string
on the end, pass the URL through encodeURI( )
before navigating to the URL to make sure the URL is well-formed:
var newURL = "http://www.megacorp.com?prod=Gizmo Deluxe"; location.href = encodeURI(newURL); // encoded URL is: http://www.megacorp.com?prod=Gizmo%20Deluxe
In contrast, the encodeURIComponent( )
method
encodes far more characters that might find their way into value
strings of forms or script-generated search strings. Encodable
characters unique to encodeURIComponent( )
are
shown in bold:
space
"# $
%& + , / : ;
<=
>? @
[ \ ] ^ ` { | }
You may recognize some of the encodeURIComponent(
)
values as those frequently appearing within complex URLs,
especially the ?
, &
, and
=
symbols. For this reason, you want to apply the
encodeURIComponent( )
only to values of name/value
pairs before those values are inserted or appended to a URL. But then
it gets dangerous to pass the composite URL through
encodeURI( )
again because the
%
symbols of the encoded characters will,
themselves, be encoded, probably causing problems on the server end
when parsing the input from the client.
If, for backward-compatibility reasons, you need to use the
escape( )
method, be aware that this method uses a
heavy hand in choosing characters to encode. Encodable characters for
the escape( )
method are as follows:
space
! \ " # $ % & ' ( ) , : ; < = > ? @ [ \ ] ^ ` { | } ~
The @
symbol, however, is not converted in
Internet Explorer browsers via the escape( )
method.
You can see now why it is important to use the matching decoding
method if you need to return one of your encoded strings back into
plain language. If the encoded string you are trying to decode comes
from an external source (e.g., part of a URL search string returned
by the server), try to use the decodeURIComponent(
)
method on only those parts of the search string that are
the value portion of a name/value pair. That’s
typically where the heart of your passed information is, as well as
where you want to obtain the most correct conversion.
See Also
Recipe 10.6 for passing data to another page via URLs, during which value encoding is used.
Encoding and Decoding Base64 Strings
NN 2, IE 3
Solution
Use the functions of the base64.js library shown in the Discussion. Syntax for invoking the two functions is straightforward. To encode a string, invoke:
var encodedString = base64Encode("stringToEncode
");
To decode a string, invoke:
var plainString = base64Decode("encodedString
");
Discussion
Example 1-2 shows the entire base64.js library.
// Global lookup arrays for base64 conversions var enc64List, dec64List; // Load the lookup arrays once function initBase64( ) { enc64List = new Array( ); dec64List = new Array( ); var i; for (i = 0; i < 26; i++) { enc64List[enc64List.length] = String.fromCharCode(65 + i); } for (i = 0; i < 26; i++) { enc64List[enc64List.length] = String.fromCharCode(97 + i); } for (i = 0; i < 10; i++) { enc64List[enc64List.length] = String.fromCharCode(48 + i); } enc64List[enc64List.length] = "+"; enc64List[enc64List.length] = "/"; for (i = 0; i < 128; i++) { dec64List[dec64List.length] = -1; } for (i = 0; i < 64; i++) { dec64List[enc64List[i].charCodeAt(0)] = i; } } // Encode a string function base64Encode(str) { var c, d, e, end = 0; var u, v, w, x; var ptr = -1; var input = str.split(""); var output = ""; while(end = = 0) { c = (typeof input[++ptr] != "undefined") ? input[ptr].charCodeAt(0) : ((end = 1) ? 0 : 0); d = (typeof input[++ptr] != "undefined") ? input[ptr].charCodeAt(0) : ((end += 1) ? 0 : 0); e = (typeof input[++ptr] != "undefined") ? input[ptr].charCodeAt(0) : ((end += 1) ? 0 : 0); u = enc64List[c >> 2]; v = enc64List[(0x00000003 & c) << 4 | d >> 4]; w = enc64List[(0x0000000F & d) << 2 | e >> 6]; x = enc64List[e & 0x0000003F]; // handle padding to even out unevenly divisible string lengths if (end >= 1) {x = "=";} if (end = = 2) {w = "=";} if (end < 3) {output += u + v + w + x;} } // format for 76-character line lengths per RFC var formattedOutput = ""; var lineLength = 76; while (output.length > lineLength) { formattedOutput += output.substring(0, lineLength) + "\n"; output = output.substring(lineLength); } formattedOutput += output; return formattedOutput; } // Decode a string function base64Decode(str) { var c=0, d=0, e=0, f=0, i=0, n=0; var input = str.split(""); var output = ""; var ptr = 0; do { f = input[ptr++].charCodeAt(0); i = dec64List[f]; if ( f >= 0 && f < 128 && i != -1 ) { if ( n % 4 = = 0 ) { c = i << 2; } else if ( n % 4 = = 1 ) { c = c | ( i >> 4 ); d = ( i & 0x0000000F ) << 4; } else if ( n % 4 = = 2 ) { d = d | ( i >> 2 ); e = ( i & 0x00000003 ) << 6; } else { e = e | i; } n++; if ( n % 4 = = 0 ) { output += String.fromCharCode(c) + String.fromCharCode(d) + String.fromCharCode(e); } } } while (typeof input[ptr] != "undefined"); output += (n % 4 = = 3) ? String.fromCharCode(c) + String.fromCharCode(d) : ((n % 4 = = 2) ? String.fromCharCode(c) : ""); return output; } // Self-initialize the global variables initBase64( );
The library begins with two global declarations and an initialization function that creates lookup tables for the character conversions. At the end of the library is a statement that invokes the initialization function.
Scripts may call the base64Encode(
)
function directly to convert a
standard string to a Base64-encoded string. The value of the original
string is not changed, but the function returns an encoded copy. To
convert an encoded string to a standard string, use the
base64Decode( )
function, passing the encoded
string as an argument.
Netscape 6 and later include global methods that perform the same
conversions shown at length in the solution. The atob(
)
method converts a Base64-encoded string to a plain string; the
btoa( )
method converts a plain string to a
Base64-encoded string. These methods are not part of the ECMAScript
standard used as the foundation for these browser versions, so
it’s unclear when or if they will find their way
into other browsers.
Frankly, there hasn’t been a big need for Base64
encoding in most scripted web pages, but that’s
perhaps because the facilities weren’t readily
available. A Base64-encoded string contains a very small character
set: a
-z
,
A
-Z
,
0
-9
, +
,
/
, and =
. This low common
denominator scheme allows data of any type to be conveyed by
virtually any internet protocol. Binary attachments to your email are
encoded as Base64 strings for their journey en route. Your email
client decodes the simple string and generates the image, document,
or executable file that arrives with the message. You may find
additional ways to apply Base64-encoded data in your pages and
scripts. To learn more about Base64 encoding, visit http://www.ietf.org/rfc/rfc2045.txt.
See Also
Recipe 1.11 for URL-encoding techniques.
Get JavaScript & DHTML Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.