The Code
Save the following code ["How to Run the
Hacks" in the Preface] to a file called
googletech.cgi.
TIP
You'll need the XML::Simple and
SOAP::Lite Perl modules to run this hack.
#!/usr/bin/perl -w
# googletech.cgi
# Getting Google results
# without getting weblog results.
use strict;
use SOAP::Lite;
use XML::Simple;
use CGI qw(:standard);
use HTML::Entities ( );
use LWP::Simple qw(!head);
my $technoratikey = "insert technorati key here";
my $googlekey = "insert google key here";
# Set up the query term
# from the CGI input.
my $query = param("q");
# Initialize the SOAP interface and run the Google search.
my $google_wdsl = "http://api.google.com/GoogleSearch.wsdl";
my $service = SOAP::Lite->service->($google_wdsl);
# Start returning the results page;
# do this now to prevent timeouts.
my $cgi = new CGI;
print $cgi->header( );
print $cgi->start_html(-title=>'Blog Free Google Results');
print $cgi->h1('Blog Free Results for '. "$query");
print $cgi->start_ul( );
# Go through each of the results.
foreach my $element (@{$result->{'resultElements'}}) {
my $url = HTML::Entities::encode($element->{'URL'});
# Request the Technorati information for each result.
my $technorati_result = get("http://api.technorati.com/bloginfo?".
"url=$url&key=$technoratikey");
# Parse this information.
my $parser = new XML::Simple;
my $parsed_feed = $parser->XMLin($technorati_result);
# If Technorati considers this site to be a weblog,
# go onto the next result. If not, display it, and then go on.
if ($parsed_feed->{document}{result}{weblog}{name}) { next; }
else {
print $cgi-> i('<a href="'.$url.'">'.$element->Glean Weblog-Free Google Results.'</a>');
print $cgi-> l("$element->{snippet}");
}
}
print $cgi -> end_ul( );
print $cgi->end_html;
Let's step through the meaningful bits of this code.
First comes pulling in the query from Google. Notice the
10 in the doGoogleSearch; this
is the number of search results requested from Google. You should try
to set this as high as Google will allow whenever you run the script;
otherwise, you might find that searching for terms that are extremely
popular in the weblogging world does not return any results at all,
having been rejected as originating from a blog.
Since we're about to make a web services call for
every one of the returned results, which might take a while, we want
to start returning the results page now; this helps prevent
connection timeouts. As such, we spit out a header using the
CGI module, and then jump into our loop.
We then get to the final part of our code: actually looping through
the search results returned by Google and passing the HTML-encoded
URL to the Technorati API as a get request.
Technorati will then return its results as an XML document.
TIP
Be careful that you do not run out of Technorati requests. As I write
this, Technorati is offering 500 free requests a day, which, with
this script, is around 50 searches. If you make this script available
to your web site audience, you will soon run out of Technorati
requests. One possible workaround is forcing the user to enter her
own Technorati key. You can get the user's key from
the same form that accepts the query. See the
"Hacking the Hack" section for a
means of doing this.
Parsing this result is a matter of passing it through
XML::Simple. Since Technorati returns only an
XML construct containing name when the site is
thought to be a weblog, we can use the presence of this construct as
a marker. If the program sees the construct, it skips to the next
result. If it doesn't, the site is not thought to be
a weblog by Technorati and we display a link to it, along with the
title and snippet (when available) returned by Google.