O'Reilly Hacks
oreilly.comO'Reilly NetworkSafari BookshelfConferences Sign In/My Account | View Cart   
Book List Learning Lab PDFs O'Reilly Gear Newsletters Press Room Jobs  



HACK
#43
Scattersearch with Yahoo! and Google
Sometimes, illuminating results can be found when scraping from one site and feeding the results into the API of another. With scattersearching, you can narrow down the most popular related results, as suggested by Yahoo! and Google
The Code
[Discuss (0) | Link to this hack]

The Code

Save the following code to a file called scattersearch.pl.

TIP

Bear in mind that this hack, while using the Google API for the Google portion, involves some scraping of Yahoo!'s search pages and thus is rather brittle. If it stops working at any point, take a gander at the regular expressions for they're almost sure to be the breakage point.

#!/usr/bin/perl -w
#
# Scattersearch -- Use the search suggestions from
# Yahoo! to build a series of intitle: searches at Google. 
     
use strict;
     
use LWP;
use SOAP::Lite;
use CGI qw/:standard/;
     
# Get our query, else die miserably.
my $query = shift @ARGV; die unless $query;
     
# Your Google API developer's key.
my $google_key = 'insert key here';
     
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
     
# Search Yahoo! for the query.
my $ua  = LWP::UserAgent->new;
my $url = URI->new('http://search.yahoo.com/search');
$url->query_form(rs => "more", p => $query);
my $yahoosearch = $ua->get($url)->content;
$yahoosearch =~ s/[\f\t\n\r]//isg;
     
# And determine if there were any results.
$yahoosearch =~ m!Also try:(.*?)  !migs;
die "Sorry, there were no results!\n" unless $1;
my $recommended = $1;

# Now, add all our results into
# an array for Google processing.
my @googlequeries;
while ($recommended =~ m!<a href=".*?">(.*?)</a>!mgis) {
    my $searchitem = $1; 
    $searchitem =~ s/nobr|<[^>]*>|\///g;
    print "$searchitem\n";
    push (@googlequeries, $searchitem);
}

# Print our header for the results page.
print join "\n",
start_html("ScatterSearch");
     h1("Your Scattersearch Results"),
     p("Your original search term was '$query'"),
     p("That search had " . scalar(@googlequeries). " recommended terms."),
     p("Here are result numbers from a Google search"),
     CGI::start_ol( );
     
# Create our Google object for API searches.
my $gsrch = SOAP::Lite->service("file:$google_wdsl");
     
# Running the actual Google queries.
foreach my $googlesearch (@googlequeries) {
    my $titlesearch = "allintitle:$googlesearch"; 
    my $count = $gsrch->doGoogleSearch($google_key, $titlesearch,
                                       0, 1, "false", "",  "false",
                                       "", "", "");
    my $url = $googlesearch; $url =~ s/ /+/g; $url =~ s/\"/%22/g;
    print li("There were $count->{estimatedTotalResultsCount} ".
             "results for the recommended search <a href=\"http://www.".
             "google.com/search?q=$url&num=100\">$googlesearch</a>");
}
     
print CGI::end_ol( ), end_html;


O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website: | Customer Service: | Book issues:

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.