Because PHP programs often interact with HTML pages, web addresses
(URLs), and databases, there are functions to help you work with those
types of data. HTML, web page addresses, and database commands are all
strings, but they each require different characters to be escaped in
different ways. For instance, a space in a web address must be written as
%20
, while a literal less-than sign
(<
) in an HTML document must be
written as <
. PHP has a number
of built-in functions to convert to and from these encodings.
Special characters in HTML are represented by
entities such as &
and <
. There are two PHP functions that
turn special characters in a string into their entities: one for
removing HTML tags, and one for extracting only meta
tags.
The htmlentities()
function changes all characters with HTML entity equivalents into
those equivalents (with the exception of the space character). This
includes the less-than sign (<
),
the greater-than sign (>
), the
ampersand (&
), and accented
characters.
For example:
$string
=
htmlentities
(
"Einstürzende Neubauten"
);
echo
$string
;
Einst
ü
rzende
Neubauten
The entity-escaped version (ü
—seen by viewing the source)
correctly displays as ü in the rendered web page. As you can see, the space has not been turned into
.
The htmlentities()
function
actually takes up to three arguments:
$output = htmlentities(input
,quote_style
,charset
);
The charset
parameter, if given,
identifies the character set. The default is “ISO-8859-1.” The
quote_style
parameter controls whether
single and double quotes are turned into their entity forms. ENT_COMPAT
(the default) converts only
double quotes, ENT_QUOTES
converts
both types of quotes, and ENT_NOQUOTES
converts neither. There is no
option to convert only single quotes. For example:
$input
=
<<<
End
"Stop pulling my hair!"
Jane
's eyes flashed.<p>
End;
$double = htmlentities($input);
// "Stop pulling my hair!" Jane'
s
eyes
flashed
.&
lt
;
p
&
gt
;
$both
=
htmlentities
(
$input
,
ENT_QUOTES
);
// "Stop pulling my hair!" Jane's eyes flashed.<p>
$neither
=
htmlentities
(
$input
,
ENT_NOQUOTES
);
// "Stop pulling my hair!" Jane's eyes flashed.<p>
The htmlspecialchars()
function converts the smallest set of entities possible to generate
valid HTML. The following entities are converted:
If you have an application that displays data that a user has
entered in a form, you need to run that data through htmlspecialchars()
before displaying or
saving it. If you don’t, and the user enters a string like "angle < 30"
or "sturm & drang"
, the browser will think
the special characters are HTML, resulting in a garbled page.
Like htmlentities()
, htmlspecialchars()
can take up to three
arguments:
$output = htmlspecialchars(input
, [quote_style
, [charset
]]);
The quote_style
and
charset
arguments have the same meaning
that they do for htmlentities()
.
There are no functions specifically for converting back from the
entities to the original text, because this is rarely needed. There is
a relatively simple way to do this, though. Use the get_html_translation_table()
function to
fetch the translation table used by either of these functions in a
given quote style. For example, to get the translation table that
htmlentities()
uses, do
this:
$table
=
get_html_translation_table
(
HTML_ENTITIES
);
To get the table for htmlspecialchars()
in ENT_NOQUOTES
mode, use:
$table
=
get_html_translation_table
(
HTML_SPECIALCHARS
,
ENT_NOQUOTES
);
A nice trick is to use this translation table, flip it
using array_flip()
, and feed it to
strtr()
to apply it to a string,
thereby effectively doing the reverse of htmlentities()
:
$str
=
htmlentities
(
"Einstürzende Neubauten"
);
// now it is encoded
$table
=
get_html_translation_table
(
HTML_ENTITIES
);
$revTrans
=
array_flip
(
$table
);
echo
strtr
(
$str
,
$revTrans
);
// back to normal
Einst
ü
rzende
Neubauten
You can, of course, also fetch the translation table, add
whatever other translations you want to it, and then do the strtr()
. For example, if you wanted htmlentities()
to also encode spaces to
s, you would do:
$table
=
get_html_translation_table
(
HTML_ENTITIES
);
$table
[
' '
]
=
' '
;
$encoded
=
strtr
(
$original
,
$table
);
The strip_tags()
function removes HTML tags from a string:
$input
=
'<p>Howdy, "Cowboy"</p>'
;
$output
=
strip_tags
(
$input
);
// $output is 'Howdy, "Cowboy"'
The function may take a second argument that specifies a string of tags to leave in the string. List only the opening forms of the tags. The closing forms of tags listed in the second parameter are also preserved:
$input
=
'The <b>bold</b> tags will <i>stay</i><p>'
;
$output
=
strip_tags
(
$input
,
'<b>'
);
// $output is 'The <b>bold</b> tags will stay'
Attributes in preserved tags are not changed by strip_tags()
. Because attributes such as
style
and onmouseover
can affect the look and behavior
of web pages, preserving some tags with strip_tags()
won’t necessarily remove the
potential for abuse.
The get_meta_tags()
function returns an array of the meta tags for an HTML page, specified
as a local filename or URL. The name of the meta tag (keywords
, author
, description
, etc.) becomes the key in the
array, and the content of the meta tag becomes the corresponding
value:
$metaTags
=
get_meta_tags
(
'http://www.example.com/'
);
echo
"Web page made by
{
$metaTags
[
'author'
]
}
"
;
Web
page
made
by
John
Doe
The general form of the function is:
$array = get_meta_tags(filename
[,use_include_path
]);
Pass a true
value for
use_include_path
to let PHP attempt to open
the file using the standard include path.
PHP provides functions to convert to and from URL encoding,
which allows you to build and decode URLs. There are actually two types
of URL encoding, which differ in how they treat spaces. The first
(specified by RFC 3986) treats a space as just another illegal character
in a URL and encodes it as %20
. The
second (implementing the application/x-www-form-urlencoded
system)
encodes a space as a +
and is used in
building query strings.
Note that you don’t want to use these functions on a complete URL, such as http://www.example.com/hello, as they will escape the colons and slashes to produce:
http%3A%2F%2Fwww.example.com%2Fhello
Only encode partial URLs (the bit after http://www.example.com/) and add the protocol and domain name later.
To encode a string according to the URL conventions, use
rawurlencode()
:
$output = rawurlencode(input
);
This function takes a string and returns a copy with illegal URL
characters encoded in the %dd
convention.
If you are dynamically generating hypertext references for links
in a page, you need to convert them with rawurlencode()
:
$name
=
"Programming PHP"
;
$output
=
rawurlencode
(
$name
);
echo
"http://localhost/
{
$output
}
"
;
http
://
localhost
/
Programming
%
20
PHP
The rawurldecode()
function decodes URL-encoded strings:
$encoded
=
'Programming%20PHP'
;
echo
rawurldecode
(
$encoded
);
Programming
PHP
The urlencode()
and
urldecode()
functions differ from
their raw counterparts only in that they encode spaces as plus signs
(+
) instead of as the sequence
%20
. This is the format for
building query strings and cookie values. These functions can be
useful in supplying form-like URLs in the HTML. PHP automatically
decodes query strings and cookie values, so you don’t need to use
these functions to process those values. The functions are useful for
generating query strings:
$baseUrl
=
'http://www.google.com/q='
;
$query
=
'PHP sessions -cookies'
;
$url
=
$baseUrl
.
urlencode
(
$query
);
echo
$url
;
http
://
www
.
.
com
/
q
=
PHP
+
sessions
+-
cookies
Most database systems require that string literals in your SQL
queries be escaped. SQL’s encoding scheme is pretty simple—single
quotes, double quotes, NUL-bytes, and backslashes need to be preceded by
a backslash. The addslashes()
function adds these slashes, and the stripslashes()
function removes them:
$string
=
<<<
EOF
"It's never going to work,"
she
cried
,
as
she
hit
the
backslash
(
\
)
key
.
EOF
;
$string
=
addslashes
(
$string
);
echo
$string
;
echo
stripslashes
(
$string
);
\
"It\'s never going to work,
\"
she cried,
as
she
hit
the
backslash
(
\\
)
key
.
"It's never going to work,"
she
cried
,
as
she
hit
the
backslash
(
\
)
key
.
Note
Some databases (Sybase, for example) escape single quotes with
another single quote instead of a backslash. For those databases,
enable magic_quotes_sybase
in your
php.ini file.
The addcslashes()
function
escapes arbitrary characters by placing backslashes before them. With
the exception of the characters in Table 4-4, characters with
ASCII values less than 32 or above 126 are encoded with their octal
values (e.g., "\002"
). The addcslashes()
and stripcslashes()
functions are used with
nonstandard database systems that have their own ideas of which
characters need to be escaped.
Table 4-4. Single-character escapes recognized by addcslashes() and stripcslashes()
ASCII value | Encoding |
---|---|
7 |
|
8 |
|
9 |
|
10 |
|
11 |
|
12 |
|
13 |
|
Call addcslashes()
with two
arguments—the string to encode and the characters to escape:
$escaped = addcslashes(string
,charset
);
Specify a range of characters to escape with the ".."
construct:
echo
addcslashes
(
"hello
\t
world
\n
"
,
"
\x00
..
\x1f
z..
\xff
"
);
hello\tworld\n
Beware of specifying '0'
,
'a'
, 'b'
, 'f'
,
'n'
, 'r'
, 't'
,
or 'v'
in the character set, as they
will be turned into '\0'
, '\a'
, etc. These escapes are recognized by C
and PHP and may cause confusion.
stripcslashes()
takes a string
and returns a copy with the escapes expanded:
$string = stripcslashes(escaped
);
For example:
$string
=
stripcslashes
(
'hello\tworld\n'
);
// $string is "hello\tworld\n"
Get Programming PHP, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.