Sebastopol, CA--The Swiss Army Knife of programming languages, Perl
turns up in diverse and sundry applications. Its flexibility makes it a
favorite of coders, and with its multi-purpose modules--like the tools
and gadgets on a pocketknife--there are very few tasks that Perl is not
applied to. One of Perl's handiest and most practical tools is LWP
(Library for WWW in Perl), the suite of modules for fetching and
processing web pages. There is a wealth of information on the Web:
news, weather, government info, shopping, discussion groups, product
info, reviews, games, and other entertainment, and LWP can help
automate all of it. In his book, Perl & LWP (O'Reilly, US $34.95),
author Sean Burke shows how to use the powerful LWP library and its
related HTML tools to build useful web client applications to automate
various tasks on the Web.
LWP is the most frequently downloaded Perl distribution in all of CPAN
(Comprehensive Perl Archive Network). It enables programmers to write
"spiders" to automatically fetch web pages, extract information from
HTML pages, submit forms, and write homegrown servers. With LWP,
programmers can dispense with graphical web browsers such as Netscape
Navigator and interact with web servers directly, making it ideal for
repetitive tasks that would be cumbersome to perform with a browser.
"As people deal more and more with the Web, there are more tasks that
we routinely carry out over the Web that could be automated using LWP
or the HTML-parsing modules," says Burke. "For example, I'm a fan of
CSPAN2's weekend programming, Book TV, but sometimes they'll have an
interesting author on at 5 a.m. on Saturday morning, when I definitely
would not be awake and flipping channels. If I want to catch these
things, I have to program my VCR in advance. However, that means I have
to remember to look at Book TV's web site on Friday night, and
remembering is not one of my strong points. So, I wrote a simple LWP
program that emails me the web page from the Book TV web site, and then
I scheduled crontab to run that program every Friday afternoon. So,
what used to be a matter of often missing really good programs is now
convenient: I get an email message every Friday night, skim it for
interesting authors or subjects, and program the VCR accordingly."
"Perl and LWP" includes many step-by-step examples that show readers
how to apply the various techniques for their own needs. Programs to
extract information from the web sites of BBC News, AltaVista,
ABEBooks.com, and Weather Underground, as well as others, are explained
in detail. The book also covers:
- Understanding LWP and its design
- Fetching and analyzing web pages
- Extracting information from HTML using regular expressions, tokens,
and trees
- Setting and inspecting HTTP headers and response codes
- Accessing information that requires authentication or cookies
- Extracting links
- Cooperating with proxy caches
- Writing web spiders (a.k.a. robots) in a safe fashion
Says Burke, "Readers will realize that they can make their life simpler
by using what they've learned in this book to write a few little LWP
programs to automate two or three of their most common tasks that
involve the Web. That needn't be something like getting TV listings off
the Web; it could be a program that checks the server status page on a
dozen different servers and shows them all on a single page, for the
convenience of the server administrator."
Perl programmers who want to automate and mine the Web can pick up this
book and be immediately productive. Written by a contributor to LWP,
with a foreword by one of LWP's creators, "Perl & LWP" is the
authoritative guide to this powerful and popular toolkit.
Additional resources:
Perl & LWP
By Sean M. Burke
ISBN 0-596-00178-9, 242 pages, $34.95 (US), $54.95 (CAN)
order@oreilly.com
1-800-998-9938; 1-707-827-7000