11.11. Removing HTML and PHP Tags
Problem
You want to remove HTML and PHP tags from a string or file.
Solution
Use strip_tags( )
to remove HTML and PHP tags from a string:
$html = '<a href="http://www.oreilly.com">I <b>love computer books.</b></a>';
print strip_tags($html);
I love computer books.
Use fgetss( )
to remove them from a file as you read in
lines:
$fh = fopen('test.html','r') or die($php_errormsg); while ($s = fgetss($fh,1024)) { print $s; } fclose($fh) or die($php_errormsg);
Discussion
While fgetss( )
is convenient if you need to strip
tags from a file as you read it in, it may get confused if tags span
lines or if they span the buffer that fgetss( )
reads from the file. At the price of increased memory usage, reading
the entire file into a string provides better results:
$no_tags = strip_tags(join('',file('test.html')));
Both strip_tags( )
and fgetss( )
can be told not to remove certain tags by specifying
those tags as a last argument. The tag specification is
case-insensitive, and for pairs of tags, you only have to specify the
opening tag. For example, this removes all but
<b></b>
tags from
$html
:
$html = '<a href="http://www.oreilly.com">I <b>love</b> computer books.</a>';
print strip_tags($html,'<b>');
I <b>love</b> computer books.
See Also
Documentation on strip_tags( )
at
http://www.php.net/strip-tags and
fgetss( )
at
http://www.php.net/fgetss.
Get PHP Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.