O'Reilly Hacks
oreilly.comO'Reilly NetworkSafari BookshelfConferences Sign In/My Account | View Cart   
Book List Learning Lab PDFs O'Reilly Gear Newsletters Press Room Jobs  



HACK
#41
Scrape Yahoo! Buzz for a Google Search
A proof-of-concept hack scrapes the buzziest items from Yahoo! Buzz and submits them to a Google search
The Code
[Discuss (0) | Link to this hack]

The Code

Save the following code to a plain text file named buzzgle.pl:

#!/usr/local/bin/perl
# buzzgle.pl
# Pull the top item from the Yahoo! Buzz Index and query the last
# three day's worth of Google's index for it.
# Usage: perl buzzgle.pl
     
# Your Google API developer's key.
my $google_key='insert key here';
     
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
     
# Number of days back to go in the Google index.
my $days_back = 3;
     
use strict;
     
use SOAP::Lite;
use LWP::Simple;
use Time::JulianDay;
     
# Scrape the top item from the Yahoo! Buzz Index.
     
# Grab a copy of http://buzz.yahoo.com.
     
my $buzz_content = get("http://buzz.yahoo.com/") 
  or die "Couldn't grab the Yahoo Buzz: $!";
     
# Find the first item on the Buzz Index list.
my($buzziest) =  $buzz_content =~ m!http://search.yahoo.com/search\?p=.+">(.+?)<\/a>!i;
die "Couldn't figure out the Yahoo! buzz\n" unless $buzziest;
     
# Figure out today's Julian date.
my $today = int local_julian_day(time);
     
# Build the Google query.
my $query = "\"$buzziest\" daterange:" . ($today - $days_back) . "-$today"; 
     
print 
  "The buzziest item on Yahoo Buzz today is: $buzziest\n",
  "Querying Google for: $query\n",
  "Results:\n\n";
     
# Create a new SOAP::Lite instance, feeding it GoogleSearch.wsdl.
my $google_search = SOAP::Lite->service("file:$google_wdsl");
     
# Query Google.
my $results = $google_search -> 
    doGoogleSearch(
      $google_key, $query, 0, 10, "false", "",  "false",
      "", "latin1", "latin1"
    );
     
# No results?
@{$results->{resultElements}} or die "No results";
     
# Loop through the results.
foreach my $result (@{$results->{'resultElements'}}) {
 my $output = 
  join "\n",  
  $result->Scrape Yahoo! Buzz for a Google Search || "no title",
  $result->{URL},
  $result->{snippet} || 'no snippet',
  "\n";
    $output =~ s!<.+?>!!g; # drop all HTML tags
    print $output;
}


O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website: | Customer Service: | Book issues:

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.