The LWP (Library for WWW in Perl) suite of modules lets your programs download and extract information from the Web. Perl & LWP shows how to make web requests, submit forms, and even provide authentication information, and it demonstrates using regular expressions, tokens, and trees to parse HTML.. This book is a must have for Perl programmers who want to automate and mine the Web.
-
Chapter 1 Introduction to Web Automation
-
The Web as Data Source
-
History of LWP
-
Installing LWP
-
Words of Caution
-
LWP in Action
-
-
Chapter 2 Web Basics
-
URLs
-
An HTTP Transaction
-
LWP::Simple
-
Fetching Documents Without LWP::Simple
-
Example: AltaVista
-
HTTP POST
-
Example: Babelfish
-
-
Chapter 3 The LWP Class Model
-
The Basic Classes
-
Programming with LWP Classes
-
Inside the do_GET and do_POST Functions
-
User Agents
-
HTTP::Response Objects
-
LWP Classes: Behind the Scenes
-
-
Chapter 4 URLs
-
Parsing URLs
-
Relative URLs
-
Converting Absolute URLs to Relative
-
Converting Relative URLs to Absolute
-
-
Chapter 5 Forms
-
Elements of an HTML Form
-
LWP and GET Requests
-
Automating Form Analysis
-
Idiosyncrasies of HTML Forms
-
POST Example: License Plates
-
POST Example: ABEBooks.com
-
File Uploads
-
Limits on Forms
-
-
Chapter 6 Simple HTML Processing with Regular Expressions
-
Automating Data Extraction
-
Regular Expression Techniques
-
Troubleshooting
-
When Regular Expressions Aren't Enough
-
Example: Extracting Linksfrom a Bookmark File
-
Example: Extracting Linksfrom Arbitrary HTML
-
Example: Extracting Temperatures from Weather Underground
-
-
Chapter 7 HTML Processing with Tokens
-
HTML as Tokens
-
Basic HTML::TokeParser Use
-
Individual Tokens
-
Token Sequences
-
More HTML::TokeParser Methods
-
Using Extracted Text
-
-
Chapter 8 Tokenizing Walkthrough
-
The Problem
-
Getting the Data
-
Inspecting the HTML
-
First Code
-
Narrowing In
-
Rewrite for Features
-
Alternatives
-
-
Chapter 9 HTML Processing with Trees
-
Introduction to Trees
-
HTML::TreeBuilder
-
Processing
-
Example: BBC News
-
Example: Fresh Air
-
-
Chapter 10 Modifying HTML with Trees
-
Changing Attributes
-
Deleting Images
-
Detaching and Reattaching
-
Attaching in Another Tree
-
Creating New Elements
-
-
Chapter 11 Cookies, Authentication,and Advanced Requests
-
Cookies
-
Adding Extra Request Header Lines
-
Authentication
-
An HTTP Authentication Example:The Unicode Mailing Archive
-
-
Chapter 12 Spiders
-
Types of Web-Querying Programs
-
A User Agent for Robots
-
Example: A Link-Checking Spider
-
Ideas for Further Expansion
-
-
Appendix A LWP Modules
-
Appendix B HTTP Status Codes
-
100s: Informational
-
200s: Successful
-
300s: Redirection
-
400s: Client Errors
-
500s: Server Errors
-
-
Appendix C Common MIME Types
-
Appendix D Language Tags
-
Appendix E Common Content Encodings
-
Appendix F ASCII Table
-
Appendix G User's View of Object-Oriented Modules
-
A User's View of Object-Oriented Modules
-
Modules and Their Functional Interfaces
-
Modules with Object-Oriented Interfaces
-
What Can You Do with Objects?
-
What's in an Object?
-
What Is an Object Value?
-
So Why Do Some Modules Use Objects?
-
The Gory Details
-
-
Colophon
- Title:
- Perl & LWP
- By:
- Sean M. Burke
- Publisher:
- O'Reilly Media
- Formats:
-
- Ebook
- Safari Books Online
- Print Release:
- June 2002
- Ebook Release:
- February 2009
- Pages:
- 264
- Print ISBN:
- 978-0-596-00178-0
- | ISBN 10:
- 0-596-00178-9
- Ebook ISBN:
- 978-0-596-10373-6
- | ISBN 10:
- 0-596-10373-5
Our look is the result of reader comments, our own experimentation, and feedback from distribution channels. Distinctive covers complement our distinctive approach to technical topics, breathing personality and life into potentially dry subjects. The animals on the cover of Perl and LWP are blesbok. Blesbok are African antelopes related to the hartebeest. These grazing animals, native to Africa's grasslands are extinct in the wild but preserved in farms and parks.
Blesbok have slender, horselike bodies that are shorter than four feet at the shoulder. They are deep red, with white patches on their faces and rumps. A white blaze extends from between a blesbok's horns to the end of its nose, broken only by a brown band above the eyes. The blesbok's horns sweep back, up, and inward. Both male and female blesbok have horns, though the males' are thicker.
Blesbok are diurnal, most active in the morning and evening. They sleep in the shade during the hottest part of the day, as they are very susceptible to the heat. They travel from place to place in long single-file lines, leaving distinct paths. Their life span is about 13 years. Linley Dolby was the production editor and copyeditor for Perl and LWP, and Sarah Sherman was the proofreader. Rachel Wheeler and Claire Cloutier provided quality control. Johnna VanHoose Dinse wrote the index. Emily Quill provided production support.
Emma Colby designed the cover of this book, based on a series design by Edie Freedman. The cover image is a 19th-century engraving from the Dover Pictorial Archive. Emma Colby produced the cover layout with QuarkXPress 4.1 using Adobe's ITC Garamond font.
Melanie Wang designed the interior layout, based on a series design by David Futato. This book was converted to FrameMaker 5.5.6 with a format conversion tool created by Erik Ray, Jason McIntosh, Neil Walls, and Mike Sierra that uses Perl and XML technologies. The text font is Linotype Birka; the heading font is Adobe Myriad Condensed; and the code font is LucasFont's TheSans Mono Condensed. The illustrations that appear in the book were produced by Robert Romano and Jessamyn Read using Macromedia FreeHand 9 and Adobe Photoshop 6. This colophon was written by Linley Dolby.
