O'Reilly Hacks
oreilly.comO'Reilly NetworkSafari BookshelfConferences Sign In/My Account | View Cart   
Book List Learning Lab PDFs O'Reilly Gear Newsletters Press Room Jobs  


 
Buy the book!
Spidering Hacks
By Morbus Iff, Tara Calishain
October 2003
More Info

HACK
#18
Adding Progress Bars to Your Scripts
Give a visual indication that a download is progressing smoothly
The Code
[Discuss (3) | Link to this hack]

The Code

The first progress bar is the simplest, providing only a visual heartbeat so that you can be sure things are progressing and not just hanging. Save the following code to a file called progress_bar.pl and run it from the command line as perl scriptnameURL, where URL is the online location of your appropriately large piece of sample data:

#!/usr/bin/perl -w
#
# Progress Bar: Dots - Simple example of an LWP progress bar.
# http://disobey.com/d/code/ or contact morbus@disobey.com.
#
# This code is free software; you can redistribute it and/or
# modify it under the same terms as Perl itself.
#

use strict; $|++;
my $VERSION = "1.0";

# make sure we have the modules we need, else die peacefully.
eval("use LWP 5.6.9;");  die "[err] LWP 5.6.9 or greater required.\n" if $@;

# now, check for passed URLs for downloading.
die "[err] No URLs were passed for processing.\n" unless @ARGV;

# our downloaded data.
my $final_data = undef;

# loop through each URL.
foreach my $url (@ARGV) {
   print "Downloading URL at ", substr($url, 0, 40), "... ";

   # create a new useragent and download the actual URL.
   # all the data gets thrown into $final_data, which
   # the callback subroutine appends to.
   my $ua = LWP::UserAgent->new(  );
   my $response = $ua->get($url, ':content_cb' => \&callback, );
   print "\n"; # after the final dot from downloading.
}

# per chunk.
sub callback {
   my ($data, $response, $protocol) = @_;
   $final_data .= $data;
   print ".";
}

None of this code is particularly new, save the addition of our primitive progress bar. We use LWP's standard get method, but add the :content_cb header with a value that is a reference to a subroutine that will be called at regular intervals as our content is downloaded. These intervals can be suggested with an optional :read_size_hint, which is the number of bytes you'd like received before they're passed to the callback.

In this example, we've defined that the data should be sent to a subroutine named callback. You'll notice that the routine receives the actual content, $data, that has been downloaded. Since we're overriding LWP's normal $response->content or :content_file features, we now have to take full control of the data. In this hack, we store all our results in $final_data, but we don't actually do anything with them.

Most relevant, however, is the print statement within the callback routine. This is our first pathetic attempt at visual feedback during the downloading process: every time a chunk of data gets sent our way, we spit out a dot. If the total data size is sufficiently large, our screen will be filled with dots, dots, and more dots:

Downloading URL at http://disobey.com/large_file.mov...
..........................................................................
..........................................................................
..........................................................................
..........................................................................
..........................................................................
.....................................................................

While useful, it's certainly not very pretty, and it can be especially disruptive for large files (the previous example is the output of downloading just 700 KB). Instead, how about we use a little primitive animation?

If you've worked in the shell or installed various programs (or even a retail version of Linux), you may have seen rotating cursors built from ASCII letters. These cursors could start at \, erase that character, draw a |, erase, /, erase, -, and then \ to restart the loop. Individually, and without the benefit of a flipbook, these look pretty boring. Onscreen, however, they create a decent equivalent to an hourglass or spinning ball.

Modify the previous script, adding the highlighted lines:

...
# our downloaded data.
my $final_data = undef;

# your animation and counter.
my $counter; my @animation = qw( \ | / - );

# loop through each URL.
foreach my $url (@ARGV) 
...

This initializes a counter and creates an array that contains the frames of our animations. As you can see, we use the same frames we discussed earlier. If you don't like `em, customize your own (perhaps . i l i). The last change we need to make is in our callback routine. Swap out the existing print "." with:

print "$animation[$counter++]\b";
$counter = 0 if $counter == scalar(@animation);

And that's it. For each chunk of data we receive, the next frame of the animation will play. When our counter is the same as the number of frames, we reset and begin anew. Obviously, we can't show a readily apparent example of what this looks like, so try it at your leisure.

We can still do better, though. We've certainly removed the distracting dot distortion, but we're still left with only simple output; we don't have raw information on how far we've gone and how far still to go. The following code provides a progress meter with a visual percentage bar, as well as a numerical reading:

#!/usr/bin/perl -w
#
# Progress Bar: Wget - Wget style progress bar with LWP.
# http://disobey.com/d/code/ or contact morbus@disobey.com.
# Original routine by tachyon at http://tachyon.perlmonk.org/
#
# This code is free software; you can redistribute it and/or
# modify it under the same terms as Perl itself.
#

use strict; $|++;
my $VERSION = "1.0";

# make sure we have the modules we need, else die peacefully.
eval("use LWP 5.6.9;");  die "[err] LWP 5.6.9 or greater required.\n" if $@;

# now, check for passed URLs for downloading.
die "[err] No URLs were passed for processing.\n" unless @ARGV;

# happy golucky variables.
my $final_data;  # our downloaded data.
my $total_size;  # total size of the URL.

# loop through each URL.
foreach my $url (@ARGV) {
   print "Downloading URL at ", substr($url, 0, 40), "...\n";

   # create a new useragent and download the actual URL.
   # all the data gets thrown into $final_data, which
   # the callback subroutine appends to. before that,
   # though, get the total size of the URL in question.
   my $ua = LWP::UserAgent->new(  );
   my $result = $ua->head($url);
   my $remote_headers = $result->headers;
   $total_size = $remote_headers->content_length;

   # now do the downloading.
   my $response = $ua->get($url, ':content_cb' => \&callback );
}

# per chunk.
sub callback {
   my ($data, $response, $protocol) = @_;
   $final_data .= $data;
   print progress_bar( length($final_data), $total_size, 25, '=' );
}

# wget-style. routine by tachyon
# at http://tachyon.perlmonk.org/
sub progress_bar {
    my ( $got, $total, $width, $char ) = @_;
    $width ||= 25; $char ||= '=';
    my $num_width = length $total;
    sprintf "|%-${width}s| Got %${num_width}s bytes of %s (%.2f%%)\r", 
        $char x (($width-1)*$got/$total). '>', 
        $got, $total, 100*$got/+$total;
}

You'll notice right off the bat that we've added another subroutine at the bottom of our code. Before we get into that, check out our actual LWP request. Instead of just asking for the data, we first check the HTTP headers to see the size of the file we'll be downloading. We store this size in a $total_size variable. It plays an important part in our new subroutine, best demonstrated with a sample:

Downloading URL at http://disobey.com/large_file.mov...
|=============>          | Got 422452 bytes of 689368 (61.28%)

This is sprintf magic at work, thanks to a little magic from tachyon over at Perl Monks (http://www.perlmonks.org/index.pl?node_id=80749). As each chunk of data gets sent to our callback, the display is updated both as a bar and as a byte count and percentage. It's a wonderful piece of work and my preferred progress bar as of this writing. As you can see in the progress_bar line of the callback, you can modify the width as well as the character.

So far, we've rolled our own, but there is a module on CPAN, Term::ProgressBar (http://search.cpan.org/author/FLUFFY/Term-ProgressBar), that takes care of the lion's share of the work for us. It has a bit more functionality than sprintf, such as titling the progress bar, including an ETA, and growing to the length of the user's terminal width. Here it is in action:

#!/usr/bin/perl -w
#
# Progress Bar: Term::ProgressBar - progress bar with LWP.
# http://disobey.com/d/code/ or contact morbus@disobey.com.
# Original routine by tachyon at http://tachyon.perlmonk.org/
#
# This code is free software; you can redistribute it and/or
# modify it under the same terms as Perl itself.
#

use strict; $|++;
my $VERSION = "1.0";

# make sure we have the modules we need, else die peacefully.
eval("use LWP 5.6.9;"); 
die "[err] LWP is not the required version.\n" if $@;
eval("use Term::ProgressBar;"); # prevent word-wrapping.
die "[err] Term::ProgressBar not installed.\n" if $@;

# now, check for passed URLs for downloading.
die "[err] No URLs were passed for processing.\n" unless @ARGV;


# happy golucky variables.
my $final_data = 0;  # our downloaded data.
my $total_size;      # total size of the URL.
my $progress;        # progress bar object.
my $next_update = 0; # reduce ProgressBar use.

# loop through each URL.
foreach my $url (@ARGV) {
   print "Downloading URL at ", substr($url, 0, 40), "...\n";

   # create a new useragent and download the actual URL.
   # all the data gets thrown into $final_data, which
   # the callback subroutine appends to. before that,
   # though, get the total size of the URL in question.
   my $ua = LWP::UserAgent->new(  );
   my $result = $ua->head($url);
   my $remote_headers = $result->headers;
   $total_size = $remote_headers->content_length;

   # initialize our progress bar.
   $progress = Term::ProgressBar->new({count => $total_size, ETA => &return;
'linear'});
   $progress->minor(0);           # turns off the floating asterisks.
   $progress->max_update_rate(1); # only relevant when ETA is used.

   # now do the downloading.
   my $response = $ua->get($url, ':content_cb' => \&callback );

   # top off the progress bar.
   $progress->update($total_size);
}

# per chunk.
sub callback {
   my ($data, $response, $protocol) = @_;
   $final_data .= $data;

   # reduce usage, as per example 3 in POD.
   $next_update = $progress->update(length($final_data))if length($final_data) >= $next_update;
}

And here's its output:

Downloading URL at http://disobey.com/large_file.mov...
 13% [========                                        ]9m57s Left

More examples are available in the Term::ProgressBar documentation.


O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website: | Customer Service: | Book issues:

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.