
|
|
|
What's Your Visitor's Weather Like?
You have a web site, as most people do, and
you're interested in getting a general idea of what
you're visitor's weather is like.
Want to know if you get more comments when it's
raining or sunny? With the groundwork laid in this hack, that and
other nonsense will be readily available
The Code
[Discuss (0) | Link to this hack] |
When you're
spidering,
don't consider only data available on the Web.
Sometimes, the data is right under your nose, perhaps on your own
server or even on your own hard drive . This hack
demonstrates the large amount of information available, even when you
have only a small amount of your own data to start with. In this
case, we're looking at a web
server's log file, taking the IP address of the last
few visitors' sites, using one database to look up
the geographical location of that IP address, and then using another
to find the weather there. It's a trivial example,
perhaps, but it's also quite nifty. For example, you
could easily modify this code to greet visitors to your site with
commiserations about the rain.
For the geographical data, we're going to use the
Perl interface to the CAIDA project (http://www.caida.org/tools/utilities/netgeo/NGAPI/index.xml);
for the weather data, we're using the
Weather::Underground module, which utilizes the information
at http://www.wunderground.com.
Running the Hack
Here's a typical run of the script, invoked on the
command line:
% perl weather.pl
<h2>Where my last few visitors came from:</h2>
<ul>
<li>London, UK, where it is cloudy</li>
<li>New York, NY, where it is sunny</li>
</ul>
Using and Hacking the Hack
I have this script installed on my weblog using an Apache server-side
include. This is probably a bad idea, given the potential for slow
server responses on behalf of
CAIDA and Weather Underground, but it does
allow for completely up-to-date information. A more sensible approach
might be to change the script to produce a static file and run this
from cron every few minutes.
If you're sure of fast responses, and if you have a
dynamically created page, it would be fun to customize that page
based on the weather at the reader's location. Pithy
comments about the rain are always appreciated. Tweaking the Weather
Underground response to give you the temperature instead of a
descriptive string creates the possibility of dynamically selecting
CSS stylesheets, so that colors change based on the temperature.
Storing the weather data over a period of time gives you the
possibility of creating an "average readership
temperature" or the amount of rain that has fallen
on your audience this week. These would be fun statistics for some
and perhaps extremely useful for others.
The code loads up the access_log, reverses it to
put the last accesses at the top, and then goes through the resulting
list, line by line. First, it runs the line through a regular
expression: my ($domain,$rfc931,$authuser,$TimeDate,$Request,$Status,$Bytes,$Referrer,$Agen
t) = $line =~ /^(\S+) (\S+) (\S+) \[([^\]\[]+)\] \"([^"]*)\" (\S+) (\S+) \"?([^"]*)\
"? \"([^"]*)\"/o;
This splits the line into its different sections and is based on
Apache's
combined log format. We'll be
using only the first variable (the domain itself) from these results,
but, because this regular expression is so useful, I include it for
your cannibalistic pleasure. Anyhow, we take the domain and pass it to the
CAIDA module, retrieving a result and checking
whether that result is useful. If it's not useful,
we go to the next line in the access_log. This
highlights an important point when using third-party databases: you
must always check for a failed query. Indeed, it might even be a good
idea to treat a successful query as the exception rather than the
rule. Assuming we have a good result, we need to detect if the country is
the U.S. If it is, we make the $region the value
of the U.S. state; otherwise, we use the two-letter code for the
country. We use the country function from the
Geography::Countries module to convert the full
name of the country to the two-letter code.
—Ben Hammersley
The CodeCopy this code, changing the emphasized line to reflect the path to
your Apache installation's
access_log. Here, mine is in the same directory
as the script: #!/usr/bin/perl -w
#
# Ben Hammersley ben@benhammersley.com
# Looks up the real-world location of visiting IPs
# and then finds out the weather at those places
#
use strict;
use CAIDA::NetGeoClient;
use Weather::Underground;
use Geography::Countries;
my $apachelogfile = "access_log";
my $numberoflines = 10;
my $lastdomain = "";
# Open up the logfile.
open (LOG, "<$apachelogfile") or die $!;
# Place all the lines of the logfile
# into an array, but in reverse order.
my @lines = reverse <LOG>;
# Start our HTML document.
print "<h2>Where my last few visitors came from:</h2>\n<ul>\n";
# Go through each line one
# by one, setting the variables.
my $i; foreach my $line (@lines) {
my ($domain,$rfc931,$authuser,$TimeDate,
$Request,$Status,$Bytes,$Referrer,$Agent) =
$line =~ /^(\S+) (\S+) (\S+) \[([^\]\[]+)\] \"([^"]*)\" (\S+) # (\S+)
\"?([^"]*)\"? \"([^"]*)\"/o;
# If this record is one we saw
# the last time around, move on.
next if ($domain eq $lastdomain);
# And now get the geographical info.
my $geo = CAIDA::NetGeoClient->new( );
my $record = $geo->getRecord($domain);
my $city = ucfirst(lc($record->{CITY}));
my $region = "";
# Check to see if there is a record returned at all.
unless ($record->{COUNTRY}) { $lastdomain = $domain; next; }
# If city is in the U.S., use the state as the "region".
# Otherwise, use Geography::Countries to munge the two letter
# code for the country into its actual name. (Thanks to
# Aaron Straup Cope for this tip.)
if ($record->{COUNTRY} eq "US") {
$region = ucfirst(lc($record->{STATE}));
} else { $region = country($record->{COUNTRY}); }
# Now get the weather information.
my $place = "$city, $region";
my $weather = Weather::Underground->new(place => $place);
my $data = $weather->getweather( );
next unless $data; $data = $data->[0];
# And print it for our HTML.
print " <li>$city, $region where it is $data->{conditions}.</li>\n";
# Record the last domain name
# for the repeat prevention check
$lastdomain = $domain;
# Check whether you're not at the limit, and if you are, finish.
if ($i++ >= $numberoflines-1) { last; }
}
print "</ul>";
|
O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website:
| Customer Service:
| Book issues:
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
|
|