Press Release
July 29, 2002
Fetching Web Pages, Parsing HTML, Writing Spiders, and More: O'Reilly Releases "Perl & LWP"
Sebastopol, CA--The Swiss Army Knife of programming languages, Perl
turns up in diverse and sundry applications. Its flexibility makes it a
favorite of coders, and with its multi-purpose modules--like the tools
and gadgets on a pocketknife--there are very few tasks that Perl is not
applied to. One of Perl's handiest and most practical tools is LWP
(Library for WWW in Perl), the suite of modules for fetching and
processing web pages. There is a wealth of information on the Web:
news, weather, government info, shopping, discussion groups, product
info, reviews, games, and other entertainment, and LWP can help
automate all of it. In his book, Perl & LWP (O'Reilly, US $34.95),
author Sean Burke shows how to use the powerful LWP library and its
related HTML tools to build useful web client applications to automate
various tasks on the Web.
LWP is the most frequently downloaded Perl distribution in all of CPAN
(Comprehensive Perl Archive Network). It enables programmers to write
"spiders" to automatically fetch web pages, extract information from
HTML pages, submit forms, and write homegrown servers. With LWP,
programmers can dispense with graphical web browsers such as Netscape
Navigator and interact with web servers directly, making it ideal for
repetitive tasks that would be cumbersome to perform with a browser.
"As people deal more and more with the Web, there are more tasks that
we routinely carry out over the Web that could be automated using LWP
or the HTML-parsing modules," says Burke. "For example, I'm a fan of
CSPAN2's weekend programming, Book TV, but sometimes they'll have an
interesting author on at 5 a.m. on Saturday morning, when I definitely
would not be awake and flipping channels. If I want to catch these
things, I have to program my VCR in advance. However, that means I have
to remember to look at Book TV's web site on Friday night, and
remembering is not one of my strong points. So, I wrote a simple LWP
program that emails me the web page from the Book TV web site, and then
I scheduled crontab to run that program every Friday afternoon. So,
what used to be a matter of often missing really good programs is now
convenient: I get an email message every Friday night, skim it for
interesting authors or subjects, and program the VCR accordingly."
"Perl and LWP" includes many step-by-step examples that show readers
how to apply the various techniques for their own needs. Programs to
extract information from the web sites of BBC News, AltaVista,
ABEBooks.com, and Weather Underground, as well as others, are explained
in detail. The book also covers:
- Understanding LWP and its design
- Fetching and analyzing web pages
- Extracting information from HTML using regular expressions, tokens,
and trees
- Setting and inspecting HTTP headers and response codes
- Accessing information that requires authentication or cookies
- Extracting links
- Cooperating with proxy caches
- Writing web spiders (a.k.a. robots) in a safe fashion
Says Burke, "Readers will realize that they can make their life simpler
by using what they've learned in this book to write a few little LWP
programs to automate two or three of their most common tasks that
involve the Web. That needn't be something like getting TV listings off
the Web; it could be a program that checks the server status page on a
dozen different servers and shows them all on a single page, for the
convenience of the server administrator."
Perl programmers who want to automate and mine the Web can pick up this
book and be immediately productive. Written by a contributor to LWP,
with a foreword by one of LWP's creators, "Perl & LWP" is the
authoritative guide to this powerful and popular toolkit.
Additional resources:
Perl & LWP
By Sean M. Burke
ISBN 0-596-00178-9, 242 pages, $34.95 (US), $54.95 (CAN)
order@oreilly.com
1-800-998-9938; 1-707-827-7000
About O'Reilly
O'Reilly Media spreads the knowledge of innovators through its books, online services, magazines, and conferences. Since 1978, O'Reilly Media has been a chronicler and catalyst of cutting-edge development, homing in on the technology trends that really matter and spurring their adoption by amplifying "faint signals" from the alpha geeks who are creating the future. An active participant in the technology community, the company has a long history of advocacy, meme-making, and evangelism.
Return to: O'Reilly Press Room
|
Recent Press Releases
Press Release Archive »
Resources
Press Contacts
Corporate
Sara Winge
800/998-9938 x7109
Media Relations - North America
Sara Peyton
800/998-9938 x7118
Media Relations - Germany
Corina Pahrmann
+49-221-973160-22
Media Relations - Japan
Kenji Watari
+81-3-3356-5227
Media Relations - United Kingdom
Josette Garcia
+44 (0)1252-721284
Media Relations - Conferences
Maureen Jennings
800/998-9938 x7083
|