Chapter 13. Web Automation
Introduction
Most of the time, PHP is part of a web server, sending content to browsers. Even when you run it from the command line, it usually performs a task and then prints some output. PHP can also be useful, however, playing the role of a web client, retrieving URLs and then operating on the content. Most recipes in this chapter cover retrieving URLs and processing the results, although there are a few other tasks in here as well, such as cleaning up URLs and some JavaScript-related operations.
There are many ways retrieve a remote URL in PHP. Choosing one
method over another depends on your needs for simplicity, control, and
portability. The three methods discussed in this chapter are standard
file functions, the cURL extension, and the HTTP_Request
class from PEAR. These three
methods can generally do everything you need and at least one of them
should be available to you whatever your server configuration or ability
to install custom extensions. Other ways to retrieve remote URLs include
the pecl_http
extension (http://pecl.php.net/package/pecl_http), which, while
still in development, offers some promising features, and using the
fsockopen()
function to open a
socket over which you send an HTTP request that you construct piece by
piece.
Using a standard file function such as file_get_contents()
is simple and
convenient. It automatically follows redirects, so if you use this
function to retrieve the directory http://www.example.com/people and the ...
Get PHP Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.