Chapter 8. Programming Google
When search engines first appeared on the scene, they were more open to being spidered, scraped, and aggregated. Sites such as Excite and AltaVista didn’t worry too much about the odd surfer using Perl to grab a slice of a page or meta-search engines including their results in aggregated search results. Sure, egregious data suckers might get shut out, but the search engines weren’t worried about sharing their information on a smaller scale.
Google never took that stance. Instead, it has regularly prohibited
meta-search engines from using its content without a license, and it tries
its best to block unidentified web agents such as Perl’s LWP::Simple
module or even wget
on the command line. Google has even been
known to block IP address ranges for running automated queries.
Google had every right to do this; after all, it was its search technology, database, and computer power. Unfortunately, however, these policies meant that casual researchers and Google nuts, like you and I, couldn’t play with its rich dataset in any automated way.
Google changed all that with the release of the Google Web API (http://api.google.com)
in the spring of 2002. The Google Web API doesn’t allow you to do every
kind of search possible—for example, it doesn’t support the phonebook:
syntax—but it does make available
Google’s eight-billion-page web database so that developers can create
their own interfaces and use Google search results to their liking.
Tip
API stands for “Application ...
Get Google Hacks, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.