O'Reilly Hacks
oreilly.comO'Reilly NetworkSafari BookshelfConferences Sign In/My Account | View Cart   
Book List Learning Lab PDFs O'Reilly Gear Newsletters Press Room Jobs  



HACK
#28
Permute a Query
Run all permutations of query keywords and phrases to squeeze the last drop of results from the Google index
The Code
[Discuss (0) | Link to this hack]

The Code

Save the following code as a CGI script ["How to Run the Hacks" in the Preface] named order_matters.cgi in your web site's cgi-bin directory. As you type in the script, be sure to replace insertkey here with your Google API key.

TIP

You'll need to have the Algorithm::Permute Perl module for this program to work correctly (http://search.cpan.org/search?query=algorithm%3A%3Apermute&mode=all).

#!/usr/local/bin/perl
# order_matters.cgi
# Queries Google for every possible permutation of up to 4 query keywords,
# returning result counts by permutation and top results across permutations.
# order_matters.cgi is called as a CGI with form input
     
# Your Google API developer's key.
my $google_key='insert key here';
     
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
     
use strict;
     
use SOAP::Lite;
use CGI qw/:standard *table/;
use Algorithm::Permute;
     
print
  header( ),
  start_html("Order Matters"),
  h1("Order Matters"),
  start_form(-method=>'GET'),
  'Query:   ', textfield(-name=>'query'),
  '   ',
  submit(-name=>'submit', -value=>'Search'), br( ),
  '<font size="-2" color="green">Enter up to 4 query keywords or "quoted phrases"</font>',
  end_form( ), p( );
     
if (param('query')) {
     
 # Glean keywords.
 my @keywords = grep !/^\s*$/,  split /([+-]?".+?")|\s+/, param('query');
     
 scalar @keywords > 4 and 
  print('<font color="red">Only 4 query keywords or phrases allowed.</font>'), last; 
     
 my $google_search = SOAP::Lite->service("file:$google_wdsl");
     
 print 
  start_table({-cellpadding=>'10', -border=>'1'}),
  Tr([th({-colspan=>'2'}, ['Result Counts by Permutation' ])]),
  Tr([th({-align=>'left'}, ['Query', 'Count'])]);
 
 my $results = {}; # keep track of what we've seen across queries
 
 # Iterate over every possible permutation.
 my $p = new Algorithm::Permute( \@keywords );
 while (my $query = join(' ', $p->next)) {
     
  # Query Google.
  my $r = $google_search -> 
   doGoogleSearch(
    $google_key, 
    $query,
    0, 10, "false", "",  "false", "", "latin1", "latin1"
   );
     print Tr([td({-align=>'left'}, [$query, $r->{'estimatedTotalResultsCount'}] )]);
  @{$r->{'resultElements'}} or next;
   
  # Assign a rank.
  my $rank = 10;
  foreach (@{$r->{'resultElements'}}) {
   $results->{$_->__CON_L_BRACKETCON_R_BRACKET_  _} = {
    title => $_->Permute a Query,
    snippet => $_->{snippet},
    seen => ($results->{$_->{URL}}->{seen}) + $rank
   };
   $rank--;
  }
}
     
print 
  end_table( ), p( ),
  start_table({-cellpadding=>'10', -border=>'1'}),
  Tr([th({-colspan=>'2'}, ['Top Results across Permutations' ])]),
  Tr([th({-align=>'left'}, ['Score', 'Result'])]);
     
foreach ( sort { $results->{$b}->{seen} <=> $results->{$a}->{seen} } keys %$results ) {
  print Tr(td([
    $results->{$_}->{seen},
    b($results->{$_}->Permute a Query||'no title') . br( ) .
    a({href=>$_}, $_) . br( ) .
    i($results->{$_}->{snippet}||'no snippet')
  ]));
}
     
  print end_table( ),
}
print end_html( );


O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website: | Customer Service: | Book issues:

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.