Content syndication via RSS and XML and blogging are extremely hot topics, but there are few tools available to track people reading and interacting with your content and articles. With a little bit of Perl knowledge, you can use our “build your own” hack to write a bare-bones RSS traffic analyzer.
If you’re willing to roll up your sleeves a bit and dig into some Perl, you can significantly enhance your ability to track syndicated content compared to the little you’re likely able to learn using only web measurement tools [Hack #47] . Using the following scripts to track your own RSS feeds and posts will tell you:
What articles and posts people read
Who refers people to your work
Where readers click out to from your posts (which links are clicked)
For syndicated content, this is pretty much it: the information you need to determine the reach and response to your blogging activities. While it depends on a little bit more code—and it won’t work on every blogging platform or every RSS reader because there is really no better source for this data—the results are very satisfying.
The code for this hack is relatively simple and broken into four parts:
The code that goes into each RSS feed or article you want to track
The code that the RSS feed will call (track_rss.js)
The code that will process the resulting request, generated by the first two blocks of code (write_rss_tag.cgi) and generate a log of your RSS activity (rss.log)
This code functions in nearly the same way as a client-side page tag [Hack #28] by leveraging a “round trip” call to an external JavaScript file.
In order to enable measurement, you need to add the following code to each post you want tracked.
<DIV ID="NAME OF ARTICLE
"> <!-- YOUR ARTICLE OR CONTENT WOULD GO HERE --> </DIV> <SCRIPT LANGUAGE="JAVASCRIPT">n="NAME OF ARTICLE
";</SCRIPT> <SCRIPT LANGUAGE="JavaScript" SRC="http://www.yourserverlocation.com/scripts /track_rss.js
"></script>
Remember to change the NAME OF
ARTICLE
to the actual name of the article as you’d
like tracked and the location of the http://www.yourserverlocation.com/scripts/track_rss.js
file to the actual location where that file is kept:
Warning
The NAME OF ARTICLE
must be
identical in the DIV
and
Java-Script definition for this code to work.
For example, if you had written a weblog post about how great Firefox is, the whole code might look like this:
<DIV ID="Firefox is so super cool!
"> I love Firefox, it is so cool. <a href=mailto:me@mysite.com>Mail me</A> if you love Firefox as much as I do. </DIV> <SCRIPT LANGUAGE="JAVASCRIPT">n="Firefox is so super cool!
";</SCRIPT> <SCRIPT LANGUAGE="JavaScript" SRC="http://www.yourserverlocation.com/ scripts/track_rss.js
"></script>
Be sure to include the SCRIPT
portion of the code after the text
of the article since the JavaScript for tracking clicks depends on
being run after the page has loaded. Assuming you’ve done everything
correctly, once you deploy the article or feed via XML, you’ll end
up with the JavaScript code embedded in the appropriate XML
container.
The following code is the trackrss.js file referred to in the JavaScript you’re placing in the article proper. This code is referenced externally to minimize the amount of code that needs to be placed in the article itself. You need to save the file in a publicly available directory on your web site (for example, /scripts/).
// Declare and call the tracking image passing name, location, referrer and // random number in the query i=new Image(); i.src="http://www.yourserverlocation.com/cgi-bin/write_rss_tag. cgi
?n="+escape(n) +"&t=v&u="+escape(document.location)+"&r="+escape(document.referrer)+'&rn=' +eval(RSSRandomNum()); // Get the article container by id and the links within and iterate through them var articlecontainer = document.getElementById(n); var articlelinks = articlecontainer.getElementsByTagName('a'); for(i=0;(link=articlelinks[i]); i++) { // Build the new function to add var addfunc = "RSSClickTrack('" + escape(link.href ) + "','" + escape(n) + "');"; // Test if the link already has an onclick event defined if (link.onclick) { // Get the existing onclick function var previousstart = link.onclick.toString().indexOf('{')+1; var previousend = link.onclick.toString().lastIndexOf('}'); var previousfunc = link.onclick.toString().substring(previousstart, previousend); // Test if exisitng onclick already has the RSSClickTrack call if (previousfunc.indexOf('RSSClickTrack')<0) { // define and write the new onclick wih both the existing and the new var newfunc = addfunc + previousfunc; link.onclick= new Function(newfunc); } } else { // No esisitng onclick, create it with the new link.onclick= new Function(addfunc); } } function RSSClickTrack(link, name){ // declare and call the click tracking image passing link, name, location and //random number in the query location is passed as the referrer to the click c=new Image(); c.src="http://www.yourserverlocation.com/cgi-bin/write_rss_tag. cgi
?n="+name +"&t=c&u="+link+"&r="+escape(document.location)+'&rn='+eval(RSSRandomNum()); } function RSSRandomNum() { //get a random number to break caching rnum = Math.random() * 1000000; rnum = Math.round(rnum); return rnum; }
Warning
Use this code at your own risk! Because content syndication is still an emerging field, it is difficult to know how all RSS readers and applications will deal with JavaScript.
For this code to function properly, you need to change
the location http://www.yourserverlocation.com/cgi-bin/write_rss_tag.cgi
to the location of the write_rss_tag.cgi file
(see below). It is worth noting that the variable t
is set differently, depending on whether
the article is viewed (t=v
) or a
link is clicked (t=c
).
The following code is very similar to the “page tag” generated
in the “Build Your Own Web Measurement Application” hacks [Hack
#12] . It is written to accept input from the
JavaScript tag above. You need to save this code on your web server
in a location where it can be executed by an external script (for
example, your /cgi-bin/ directory). The
#!perl
line may need to be
adjusted to point to the location of Perl on your machine—for
example, #!/usr/bin/perl
.
# The #!perl may need to be adjusted to point to the location of perl # on your machin e, for example #!/usr/bin/perl #!perl -w use strict; # Declare the location of the logfile. The CGI program needs to be given # permission to write to this file. Exactly how to do that is # system-dependent. my $logfile = '/v ar/log/apache/rss.log'; # The name of the cookie, if any. # 'Apache' is the default for mod_usertrack cookies. my $cookie_name = 'Apache'; # We shall use the standard CGI module. This does all the work of extracting # the parameters from the query string and unescaping them. use CGI; my $cgi = new CGI; my $name = $cgi->param('n'); # Get the RSS STORY NAME my $type = $cgi->param('t'); # Get Event TYPE my $param_url = $cgi->param('u'); # Get the u= url that is quantified of the event my $env_url = $cgi->referer(); # Get the referrer from environment for noscript/image calls for url my $ref = $cgi->param('r'); # Get the r= Referrer to the event (will only be captured for javascript executed tracking calls)) # Use the referrer from the image call for the url of the page with the tag # if it exists and the incoming value for the param_url does not exist. # if neither exist set the value toUNKNOWN
. The use ofUNKNOWN
is to cover # requests from RSS Readers that don't execute javascript and/or don't send # a referrer to an image request. my $url = "UNKNOWN
"; # declare url with default value $url = $env_url if ($env_url); # use referrer to the image request if it exists $url = $param_url if ($param_url); # use param_url for url if exists # Referrer is not always specified for brevity in image tracking calls. If it is not # defined define a blank one. $ref = "" unless (defined($ref)); # As long as we've got a non-empty NAME and a non-empty TYPE # write a line in the logfile. if ($name && $type) { # Look up the current time, the client name and the cookie. The # cookie may not be present for requests from some RSS readers or it # might not be set prior to some events. my $time = time(); my $client = $cgi->remote_host(); my $cookie_val = $cookie_name ? $cgi->cookie($cookie_name) : ""; if (!defined($cookie_val)) { $cookie_val = ""; } # build the log line my $logout = "$type\t$time\t$client\t$name\t$url\t$ref\t$cookie_val"; # We need to open the logfile. We also need to lock it, to make sure that # we're not writing two requests at the same time. If we can't open it or # can't lock it, write a diagnostic message to STDERR, which is the # server's error log. use Fcntl qw/:flock/; # Import the definition of LOCK_EX unless (open (LF, ">>", "$logfile") && flock(LF, LOCK_EX)) { my $lt = localtime; my $progname = $0 || 'readrsstag.pl'; print STDERR "[$lt] $progname: Can't open logfile\n"; } # Everything worked, so jump to the end of the logfile (this is necessary # in case something was written between the time we opened it and the time # we locked it), and write the line. else { seek(LF, 0, 2); print LF "$logout\n"; close LF; } } # Finally, send a 1x1 pixel transparent gif back to the browser. # (The long list of numbers just happens to be that gif, byte by byte). print "Content-Type: image/gif\n\n"; print 'GIF89a'; print v1.0.1.0.145.0.0.0.0.0.255.255.255.255.255.255.0.0.0.33.249.4.1.0.0.2. 0.44.0.0.0.0.1.0.1.0.0.2.2.84.1.0.59;
Assuming that you’ve copied the code correctly and set the appropriate permissions for write_rss_tag.cgi on your web server, you should be all set. Again, the most important things to double check are that:
The
ID
in the <DIV
> tag in your post matches the value ofn
exactly.The reference http://www.yourserverlocation.com/scripts/track_rss.js in the JavaScript has been changed to the location of the file on your server (likely in your /scripts/ directory).
The http://www.yourserverlocation.com/cgi-bin/write_rss_tag.cgi reference in the track_rss.js file has been changed to the location of the file on your server (likely in your /cgi-bin/ directory).
Also, because some applications for deploying content via RSS
(most notably, the blogging tools) will insert HTML tags automatically
(usually the </BR
> tag), you
should double check that the JavaScript renders correctly when the
post is viewed.
Once you’ve successfully deployed the data collection code, you’ll generate a logfile similar to the one in Figure 1-13.
All that’s left is to parse this log and generate reports [Hack #36] . We’ll do this using a series of Perl objects, a strategy similar to the “build your own” hacks in this book, and one that allows greater flexibility if you want to modify this code for your own purposes.
—Ian Houston and Eric T. Peterson
Get Web Site Measurement Hacks now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.