O'Reilly Hacks
oreilly.comO'Reilly NetworkSafari BookshelfConferences Sign In/My Account | View Cart   
Book List Learning Lab PDFs O'Reilly Gear Newsletters Press Room Jobs  



HACK
#32
Dig Deeper into Sites
Dig deeper into the hierarchies of web sites matching your search criteria
The Code
[Discuss (1) | Link to this hack]

The Code

Save this code as deep_blue_g.cgi, a CGI script ["How to Run the Hacks" in the Preface] on your web server. As you type it in, replace insert key here with your Google API key.

#!/usr/local/bin/perl
# deep_blue_g.cgi
# Limiting search results to a particular depth in a web 
# site's hierarchy.
# deep_blue_g.cgi is called as a CGI with form input.
     
# Your Google API developer's key.
my $google_key='insert key here';
     
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
     
# Number of times to loop, retrieving 10 results at a time.
my $loops = 10;
     
use SOAP::Lite;
use CGI qw/:standard *table/;
     
print
  header( ),
  start_html("Fishing in the Deep Blue G"),
  h1("Fishing in the Deep Blue G"),
  start_form(-method=>'GET'),
  'Query: ', textfield(-name=>'query'),
  br( ),
  'Depth: ', textfield(-name=>'depth', -default=>4),
  br( ),
  submit(-name=>'submit', -value=>'Search'),
  end_form( ), p( );
     
# Make sure a query and numeric depth are provided.
if (param('query') and param('depth') =~ /\d+/) {
     
  # Create a new SOAP object.
  my $google_search  = SOAP::Lite->service("file:$google_wdsl");
     
  for (my $offset = 0; $offset <= $loops*10; $offset += 10) {
    my $results = $google_search -> 
      doGoogleSearch(
        $google_key, param('query'), $offset, 10, "false", "",  "false",
        "", "latin1", "latin1"
      );
     
    last unless @{$results->{resultElements}};
     
    foreach my $result (@{$results->{'resultElements'}}) {
     
      # Determine depth.
      my $url = $result->{URL};
      $url =~ s!^\w+://|/$!!g;
     
      # Output only those deep enough.
      ( split(/\//, $url) - 1) >= param('depth') and 
        print 
          p(
            b(a({href=>$result->{URL}},$result->Dig Deeper into Sites||'no title')), br( ),
            $result->{URL}, br( ),
            i($result->{snippet}||'no snippet')
          );
    }
  }
     
  print end_html;
}


O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website: | Customer Service: | Book issues:

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.