Chapter 3. Downloading Web Pages

The most important thing a webbot does is move web pages from the Internet to your computer. Once the web page is on your computer, your webbot can parse and manipulate it.

This chapter will show you how to write simple PHP scripts that download web pages. More importantly, you’ll learn PHP’s limitations and how to overcome them with PHP/CURL, a special binding of the cURL library that facilitates many advanced network features. cURL is used widely by many computer languages as a means to access network files with a number of protocols and options.

Note

While web pages are the most common targets for webbots and spiders, the Web is not the only source of information for your webbots. Later chapters will explore methods ...

Get Webbots, Spiders, and Screen Scrapers, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Webbots, Spiders, and Screen Scrapers, 2nd Edition by Michael Schrenk

Chapter 3. Downloading Web Pages

Note

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly