Before I can do anything to improve my programs, I have to make a decision about what
I am going to fix. Before I spend the time to do that, I want to figure out
what I should focus on. How do I get the most improvement for the least
amount of fiddling? What should I work on first? Through the process of
profiling
, by which I record and summarize what a program
is doing, I can make those decisions. Luckily, Perl already offers several
tools to do this.
I want to compute a factorial. It’s the old saw of performance discussions, and I’ll get to something more interesting in a moment. When I Googled for “factorial subroutines,” almost every implementation (aside from those in assembly language) was a recursive algorithm, meaning that a subroutine had to figure out part of the problem, then call itself with a subproblem, and keep doing that until there are no more subproblems, eventually working its way up to the original call. Here’s how I’d write that in Perl:
#!/usr/bin/perl # factorial-recurse.pl sub factorial { return unless int( $_[0] ) == $_[0]; return 1 if $_[0] == 1; return $_[0] * factorial( $_[0] - 1 ); } print factorial( $ARGV[0] ), "\n";
Now I want to figure out how to improve this toy program. It’s
already pretty fast because Perl can’t really count that high. With
anything over 170, my program on my machine returns Inf
(more on that in a moment). Despite that,
I’ll profile it anyway. I use the Devel::SmallProf
module to get a quick
summary. I invoke it with the -d
switch, which already assumes the Devel
portion of the name (see Chapter 4):
% perl -d:SmallProf factorial.pl 170
The Devel::SmallProf
module leaves behind a
human-readable text file named smallprof.out
. In its columnar output, it shows
each line of the program, how many times I executed that line, and the
real and CPU times for each line:
================ SmallProf version 1.15 ========================= Profile of factorial.pl Page 1 ================================================================= count wall tm cpu time line 0 0.000000 0.000000 1:#!/usr/bin/perl 0 0.000000 0.000000 2: 170 0.000000 0.000000 3:sub factorial { 170 0.001451 0.000000 4: return unless int( $_[0] ) == $_[0]; 170 0.004367 0.000000 5: return 1 if $_[0] == 1; 169 0.004371 0.000000 6: return $_[0] * factorial( $_[0] - 1 ); 0 0.000000 0.000000 7: } 0 0.000000 0.000000 8: 1 0.000009 0.000000 9:print factorial( $ARGV[0] ), "\n";
To compute the factorial of 170, I had to call the subroutine 170
times. Each time (save for one!) I called that subroutine, I had to
execute the lines in the subroutine. I had to check that the argument was
an integer each time, I had to check if the argument was 1
each time, and in almost every case, I had to
call the subroutine again. That’s a lot of work. By profiling my program,
I can see what is taking up all the time, and then concentrate on
improving those areas.
The best way to fix these problems is to come up with a better way
to get the answer. Better algorithms get you better performance than
almost any other method. Instead of using a recursive solution, I changed
it to an iterative one. I can easily get the range of integers using the
range operator, and in other languages, a C style for
loop can stand in:
#!/usr/bin/perl # factorial-iterate.pl sub factorial { return unless int( $_[0] ) == $_[0]; my $f = 1; foreach ( 2 .. $_[0] ) { $f *= $_ }; $f; } print factorial( $ARGV[0] ), "\n";
When I profile this program, I see that I did not have to do as much
work. I didn’t have as much code to run. I only had to check the argument
once, I didn’t have to check if the argument was 1
, and I don’t have to make repeated calls to a
subroutine:
================ SmallProf version 1.15 ========================= Profile of factorial2.pl Page 1 ================================================================= count wall tm cpu time line 0 0.000000 0.000000 1:#!/usr/bin/perl 0 0.000000 0.000000 2: 1 0.000000 0.000000 3:sub factorial { 1 0.000021 0.000000 4: return unless int( $_[0] ) == $_[0]; 1 0.000000 0.000000 5: my $f = 1; 170 0.001632 0.000000 6: foreach ( 2 .. $_[0] ) { $f *= $_ }; 1 0.002697 0.000000 7: $f; 0 0.000000 0.000000 8: } 0 0.000000 0.000000 9: 1 0.000006 0.000000 10:print factorial( $ARGV[0] ), "\n";
Earlier I said that my program topped out at 170. I can get past
that limit by telling Perl to use the bignum
pragma:
#!/usr/bin/perl # factorial-recurse-bignum.pl use bignum; sub factorial { return unless int( $_[0] ) == $_[0]; return 1 if $_[0] == 1; return $_[0] * factorial( $_[0] - 1 ); } print factorial( $ARGV[0] ), "\n";
Now I can see some real performance differences by comparing the factorials of really big numbers. As I was finishing this book, I switched to a MacBook Pro and its dual core architecture had no problem with speed in either of the approaches. Only with really large numbers did the recursive approach really slow down.
That’s not the whole story, though. I’ve shown a really simple
program that calculates a single number. In a real program I would most
likely use the factorial
routine many,
many times with several different values. When I profile the application,
I’ll see the number of times I run the lines of the subroutine throughout
the entire process.
Either approach can benefit from caching its results. Here’s a program that repeatedly prompts me for a number. It computes the factorial and caches the results along the way, trading memory for speed. The first time I ask it to compute the factorial for 10,000, it takes several seconds. After that, when I ask it for the factorial for any number less than 10,000, it’s just a very fast lookup:
#!/usr/bin/perl # factorial-iterate-bignum-memo.pl use bignum; { my @Memo = (1); sub factorial { my $number = shift; return unless int( $number ) == $number; return $Memo[$number] if $Memo[$number]; foreach ( @Memo .. $number ) { $Memo[$_] = $Memo[$_ - 1] * $_; } $Memo[ $number ]; } } { print "Enter a number> "; chomp( my $number = <STDIN> ); exit unless defined $number; print factorial( $number ), "\n"; redo; }
I can do the same with the recursive solution, although
the Memoize
module does the extra work for
me:
#!/usr/bin/perl # factorial-recurse-bignum-memo.pl use bignum; use Memoize; memoize( factorial ); sub factorial { return unless int( $_[0] ) == $_[0]; return 1 if $_[0] == 1; return $_[0] * factorial( $_[0] - 1 ); } { print "Enter a number> "; chomp( my $number = <STDIN> ); exit unless defined $number; print factorial( $number ), "\n"; redo; }
While profiling, I must remember that some things in isolation don’t tell me the whole story. The profile can help me make decisions, but I’m the one who has to do the thinking, not the computer.
Profiling means counting, and to count something, I need to make the statements do something so I can count them. I might, for instance, use some of the features from Chapter 4 to add accounting code to my subroutines. That’s much too tedious, though. Instead of trying to account for things in individual subroutines, I try to make everything flow through a single control subroutine. This is probably too much for a small program, but in a large system, the extra computing pays off in saved developer time when I work on optimizing the program.
The most common place I do this is in database code. In the database case, I want to track with queries I make, usually so I can get an idea of which queries take a long time or which ones I most frequently use. From that, I can figure out what I should optimize.
Here’s an example of a nexus for all queries that allows me to
profile my database code. I’ve simplified this example, but this is close
to some actual code I’ve used, minus some stuff that doesn’t apply to
profiling. I have a package-scoped lexical variable %Queries
that will hold my profile data. The
simple_query
method is essentially a
wrapper around prepare
and execute
with some accounting overhead:
package My::Database; my %Queries; sub simple_query { my( $self, @args ) = @_; my $sql_statement = shift @args; $Queries{$sql_statement}++; # <--- Profiling hook my $sth = $self->dbh->prepare( $sql_statement ); unless( ref $sth ) { warn $@; return } my $rc = $sth->execute( @args ); wantarray ? ( $sth, $rc ) : $rc; }
In the rest of my database code, I have functions that use simple_query
instead of using the
DBI
interface directly. My get_postage_rates_by_country
grabs the amount of
postage I need to send mail overseas. It passes the SQL statement and a
bind parameter to simple_query
. As
before, this is real code, although I’ve cut out some bits to only show
the relevant parts:
sub get_postage_rates_by_country { my( $self, $country ) = @_; my( $sth ) = $self->simple_query( <<"SQL", $country ); SELECT PostageRates.ounces, PostageRates.rate, PostageServices.name FROM PostageRates, Countries, PostageServices WHERE Countries.pk = ? AND Countries.usps_zone = PostageRates.usps_zone AND PostageRates.service = PostageServices.pk ORDER BY PostageRates.ounces SQL return $sth->fetchall_arrayref; }
As my program does its work, the queries flow through simple_query
, which counts and records what
happens. To get the profile data, I use an END
block to create the
report. The particular format depends on what I collected during the run.
In this example I just counted statements, but I could use that %Queries
hash to store anything I wanted,
including the bind parameters, the function that called simple_query
, and so on:
END { foreach my $statement ( sort { $b <=> $a } keys %Queries ) { printf "%5d %s\n\n", $Queries{$statement}, $statement; } }
I might find, for instance, that in a long report that I repeatedly fetch the postage data for each country, even though it’s not going to change. When I realize I’m doing this after looking at the profile data, I can optimize my code to cache some of the data in memory rather than asking for the same answer in the database.
I’ve actually been coding my Perl database stuff like this for quite
a while, and I recently found out that Tim Bunce added these features
directly to DBI. He did the same sort of thing by making everything flow
through a central function. That was really easy because
DBI
already does that for queries.
The DBI::Profile
module can do much of the same work but for my use of Perl’s
database interface module, DBI
.
Database interactions are often the biggest performance drain on my
programs, and that’s a place I usually start to look for improvements.
Instead of calling subroutines unnecessarily, as in my last example, I
might be making unnecessary database queries.
Here’s a short program that takes quite a bit of time because it
makes almost 2,000 database queries. I want to build a table of number
names, so given a digit I can get the name (e.g., 9 has the name “Nine”),
or go from the name to the digit. I should probably use a Lingua::*
module, but
then I don’t want to start off with something smart. In this example, I
use the DBD::CSV
module to use a comma-separated value file as my database store. I
create a table to hold the pairs, and then start to populate the table. I
bootstrap the data by getting the first 19 names into the table, then
looking up the names I already have to create further names:
#!/usr/bin/perl # dbi-number-inserter.pl use strict; use DBI; my $dbh = DBI->connect( "DBI:CSV:f_dir=." ); $dbh->do( "DROP TABLE names" ); $dbh->do( "CREATE TABLE names ( id INTEGER, name CHAR(64) )" ); my $sth = $dbh->prepare( "INSERT INTO names VALUES ( ?, ? )" ); my $id = 1; foreach my $name ( qw(One Two Three Four Five Six Seven Eight Nine Ten), qw(Eleven Twelve Thirteen Fourteen Fifteen Sixteen Seventeen Eighteen Nineteen) ) { $sth->execute( $id++, $name ); } foreach my $name ( qw( Twenty Thirty Forty Fifty Sixty Seventy Eighty Ninety ) ) { $sth->execute( $id++, $name ); foreach my $ones_digit ( 1 .. 9 ) { my( $ones_name ) = map { lc } $dbh->selectrow_array( "SELECT name FROM names WHERE id = $ones_digit" ); $sth->execute( $id++, "$name $ones_name" ); } } foreach my $digit ( 1 .. 9 ) { my( $hundreds ) = $dbh->selectrow_array( "SELECT name FROM names WHERE id = $digit" ); $sth->execute( $id++, "$hundreds hundred" ); foreach my $tens_digit ( 1 .. 99 ) { my( $tens_name ) = map { lc } $dbh->selectrow_array( "SELECT name FROM names WHERE id = $tens_digit" ); $sth->execute( $id++, "$hundreds hundred $tens_name" ); } }
I run this from the command line, and it takes almost two minutes on
my Powerbook G4. That’s okay; I need a nice, slow example. Now I want to
profile this program to see where I can improve it, pretending I was just
handed it without knowing how it works. I set the DBI_PROFILE
environment
variable to turn on database profiling.[23]To get a report ordered by statements, I set DBI_PROFILE='!Statement'
. The sort key has an
exclamation point, !
, prepended to it.
At the end of the run, I get a long report. Here are the first several
lines:
$ env DBI_PROFILE='!Statement' perl dbi-profile.pl DBI::Profile: 109.671362s 99.70% (1986 calls) dbi-profile.pl @ 2006-10-10 02:18:40 '' => 0.000784s / 10 = 0.000078s avg (first 0.000023s, min 0.000001s, max 0.000618s) 'CREATE TABLE names ( id INTEGER, name CHAR(64) )' => 0.004258s 'DROP TABLE names' => 0.008017s 'INSERT INTO names VALUES ( ?, ? )' => 3.229462s / 1002 = 0.003223s avg (first 0.001767s, min 0.000037s, max 0.108636s) 'SELECT name FROM names WHERE id = 1' => 1.204614s / 18 = 0.066923s avg (first 0.012831s, min 0.010301s, max 0.274951s) 'SELECT name FROM names WHERE id = 10' => 1.118565s / 9 = 0.124285s avg (first 0.027711s, min 0.027711s, max 0.341782s) 'SELECT name FROM names WHERE id = 11' => 1.136748s / 9 = 0.126305s avg (first 0.032328s, min 0.032328s, max 0.378916s)
The top line gives me the wallclock time and the total number of
DBI
method calls; that’s the number of method calls to
DBI
, not the number of queries. After that, I get a
report for each query, in lexical order. Just because it looks like it’s
sorted by total time or number of queries, don’t forget to look at the
rest of the report. It’s actually sorted in alphabetical order of the
query.
For each query, DBI::Profile
reports the total
wallclock time and the number of method calls for that statement. It
doesn’t report the CPU time because it isn’t very interesting; the
database server might be another machine, and even if it is local, it’s
often a separate process. It gives an average time for that query, and
then the times for the first call, the call that took the least amount of
time, and the call that took the most. This isn’t as simple as timing a
program. The database server might perform differently given the same
input because it might be doing something else, the data size might be
different, or many other things.
From the full report, I see that most calls took about the same amount of time since they are all running pretty quickly, so I can’t make a big speedup by optimizing a query so it performs better on the database. No indexing or rearrangement of joins will likely help here.
What I really need to reduce is the number of queries so I interact
with the database less. I can’t get away from the INSERT
s since I still have to make each row, but
I don’t need to make all of those select statements. I should cache the
result so I don’t fetch the same data twice (or even at all):
#!/usr/bin/perl # dbi-number-inserter-cached.pl use strict; use DBI; my $dbh = DBI->connect( "DBI:CSV:f_dir=." ); $dbh->do( "DROP TABLE names" ); $dbh->do( "CREATE TABLE names ( id INTEGER, name CHAR(64) )" ); my $insert = $dbh->prepare( "INSERT INTO names VALUES ( ?, ? )" ); my @array = ( qw( Zero ), qw(One Two Three Four Five Six Seven Eight Nine Ten), qw(Eleven Twelve Thirteen Fourteen Fifteen Sixteen Seventeen Eighteen Nineteen) ); my $id = 0; foreach my $name ( @array ) { $insert->execute( $id++, $name ); } foreach my $name ( qw( Twenty Thirty Forty Fifty Sixty Seventy Eighty Ninety ) ) { $array[ $id ] = $name; $insert->execute( $id++, $name ); foreach my $ones_digit ( 1 .. 9 ) { my $full_name = $array[ $id ] = "$name $array[$ones_digit]"; $insert->execute( $id++, $full_name ); } } foreach my $digit ( 1 .. 9 ) { my( $hundreds ) = $array[ $digit ]; my $name = $array[$id] = "$hundreds hundred"; $insert->execute( $id++, $name ); foreach my $tens_digit ( 1 .. 99 ) { my( $tens_name ) = lc $array[ $tens_digit ]; $array[$id] = "$hundreds hundred $tens_name"; $insert->execute( $id++, "$name $tens_name" ); } }
In my first pass at improvement, I don’t have any select statements at all because I cache the results. That cuts out most of the runtime in this program. The times for each program are remarkably different. Remember, however, that I’ve made a trade-off between speed and memory. The second program is faster, but it takes up more memory:
$ time perl dbi-profile.pl real 1m48.676s user 1m21.136s sys 0m1.698s $ time perl dbi-profile2.pl real 0m2.638s user 0m1.736s sys 0m0.307s
Here’s the entire profile report for my new program, which now runs
in two percent of the original runtime. Most of the calls are INSERT
s:
$ env DBI_PROFILE='!Statement' perl dbi-profile2.pl DBI::Profile: 2.118577s 105.93% (1015 calls) dbi-profile2.pl @ 2006-10-10 02:31:10 '' => 0.000757s / 10 = 0.000076s avg (first 0.000021s, min 0.000001s, max 0.000584s) 'CREATE TABLE names ( id INTEGER, name CHAR(64) )' => 0.004216s 'DROP TABLE names' => 0.006906s 'INSERT INTO names VALUES ( ?, ? )' => 2.106698s / 1003 = 0.002100s avg (first 0.001713s, min 0.000037s, max 0.005587s)
By looking at the profile, I was able to target part of the program for improvement. It didn’t tell me how to improve it, but at least I know where I should spend my time.
The runtime report isn’t the only one I can get. With DBI_PROFILE='!MethodName'
,
DBI
orders the report according to the name of the
DBI
function. It’s in ASCII-betical order with the
uppercase letters sorting before the lowercase ones (and I’ve redacted
part of these reports since they shows all of the
methods, including the ones I didn’t even know I was using):
$ env DBI_PROFILE='!MethodName' perl dbi-profile2.pl DBI::Profile: 2.168271s 72.28% (1015 calls) dbi-profile2.pl @ 2006-10-10 02:37:16 'DESTROY' => 0.000141s / 2 = 0.000070s avg (first 0.000040s, min 0.000040s, max 0.000101s) 'FETCH' => 0.000001s 'STORE' => 0.000067s / 5 = 0.000013s avg (first 0.000022s, min 0.000006s, max 0.000022s) 'do' => 0.010498s / 2 = 0.005249s avg (first 0.006602s, min 0.003896s, max 0.006602s) 'execute' => 2.155318s / 1000 = 0.002155s avg (first 0.002481s, min 0.001777s, max 0.007023s) 'prepare' => 0.001570s
I can even combine the two since DBI::Profile
can deal with multiple sort keys if I join them with a colon. With
DBI_PROFILE='!Statement:!MethodName'
,
DBI
gives me a double layer report. Under each SQL
statement, it breaks the time down by the particular function it used. I
might, for instance, want to compare the time my database query spends
in the DBI
guts and actually fetching the
data:
$ env DBI_PROFILE='!Statement:!MethodName' perl dbi-profile2.pl DBI::Profile: 2.123325s 106.17% (1015 calls) dbi-profile2.pl @ 2006-10-10 02:38:22 '' => 'FETCH' => 0.000001s 'STORE' => 0.000069s / 5 = 0.000014s avg (first 0.000024s, min 0.000005s,↲ max 0.000024s) 'connect' => 0.000644s 'default_user' => 0.000030s 'disconnect' => 0.000050s 'disconnect_all' => 0.000024s 'CREATE TABLE names ( id INTEGER, name CHAR(64) )' => 'do' => 0.004616s 'DROP TABLE names' => 'do' => 0.007191s 'INSERT INTO names VALUES ( ?, ? )' => 'DESTROY' => 0.000149s / 2 = 0.000075s avg (first 0.000050s, min 0.000050s,↲ max 0.000099s) 'execute' => 2.108945s / 1000 = 0.002109s avg (first 0.002713s, min 0.001796s,↲ max 0.005454s) 'prepare' => 0.001606s
I can flip that last report around by using DBI_PROFILE='!MethodName:!Statement'
. The
first layer lists the DBI
method and then breaks it
down by SQL statements after that:
$ env DBI_PROFILE='!MethodName:!Statement' perl dbi-profile2.pl DBI::Profile: 2.431843s 81.06% (1015 calls) dbi-profile2.pl @ 2006-10-10 02:40:40 'DESTROY' => 'INSERT INTO names VALUES ( ?, ? )' => 0.000142s / 2 = 0.000071s avg (first 0.000039s, min 0.000039s,↲ max 0.000103s) 'FETCH' => '' => 0.000001s 'STORE' => '' => 0.000065s / 5 = 0.000013s avg (first 0.000022s, min 0.000005s,↲ max 0.000022s) 'connect' => '' => 0.000685s 'default_user' => '' => 0.000024s 'disconnect' => '' => 0.000050s 'disconnect_all' => '' => 0.000023s 'do' => 'CREATE TABLE names ( id INTEGER, name CHAR(64) )' => 0.004287s 'DROP TABLE names' => 0.006389s 'execute' => 'INSERT INTO names VALUES ( ?, ? )' => 2.418587s / 1000 = 0.002419s avg (first 0.002549s, min 0.001819s,↲ max 0.013104s) 'prepare' => 'INSERT INTO names VALUES ( ?, ? )' => 0.001589s
Sam Tregar’s DBI::ProfileDumper
module does the
same thing as DBI::Profile
, but it saves its result
in a file instead of dumping it to standard output.[24]By default, this file is named dbi.prof, but I can use any name I like. For anything but a small
application, I’ll probably have to do quite a bit of custom slicing and
dicing to extract the information I need.
First, I tell DBI which profiling class it should use by including
it in the DBI_PROFILE
value. I
join the class name to the profiling sort keys with a /
:
$ env DBI_PROFILE='!Statement'/DBI::ProfileDumper ./program.pl
Once that command completes, dbi.prof has all of the profiling data. If I
want to change the filename, I just add that to DBI_PROFILE
by appending it after the class
name:
$ env DBI_PROFILE='!Statement'/DBI::ProfileDumper/File:dbi.prof ./program.pl
Once I have the data, I can analyze them with dbiprof
, which has several options to select
the data to display, sort it in the way that I want (even on multiple
keys), and many other things:
$ dbiprof --number all --sort longest
I started with a pretty bad program that made many unnecessary
calls to the database and did quite a bit of work. I can make that
program more Perly, though, by using Perl’s list operators smartly.
Instead of all that index counting, I use push
to put things onto an array. The code is
much tighter and shorter, and it does the same thing. Instead of
inserting items as I go along, I move all the database stuff to the end
(I have secret plans for that later), but for now the program runs in
about the same time as the previous example:
#!/usr/bin/perl # dbi-number-inserter-end.pl use strict; use DBI; my @array = ( qw( Zero ), qw(One Two Three Four Five Six Seven Eight Nine Ten), qw(Eleven Twelve Thirteen Fourteen Fifteen Sixteen Seventeen Eighteen Nineteen) ); foreach my $name ( qw( Twenty Thirty Forty Fifty Sixty Seventy Eighty Ninety ) ) { push @array, $name; push @array, map { "$name $array[$_]" } 1 .. 9 } foreach my $digit ( 1 .. 9 ) { push @array, "$array[$digit] hundred"; push @array, map { "$array[$digit] hundred $array[$_]" } 1 .. 99; } my $dbh = DBI->connect( "DBI:CSV:f_dir=." ); $dbh->do( "DROP TABLE names" ); $dbh->do( "CREATE TABLE names ( id INTEGER, name CHAR(64) )" ); my $insert = $dbh->prepare( "INSERT INTO names VALUES ( ?, ? )" ); foreach my $index ( 0 .. $#array ) { $insert->execute( $index, $array[$index] ); }
Instead of using a CSV file though, I now want to use a more sophisticated database server since I think I might be able to get better performance in writing all of this stuff to disk. I have the tools to find out, so why not? I’ll use SQLite, another lightweight server that DBI can talk to. I don’t have to change too much in my program since DBI hides all of that for me. I only change the DBI connection:
# dbi-number-inserter-sqlite.pl my $dbh = DBI->connect( "DBI:SQLite:dbname=names.sqlite.db" );
When I run my program again, it’s abysmally slow. It takes a lot longer to insert all of these rows into an SQLite database store:
$ time perl dbi-profile-sqlite.pl real 5m7.038s user 0m0.572s sys 0m8.220s
That’s awful! When I profile the program, I see that the INSERT
statements take
100 times longer than my previous programs. Outrageous!
% DBI_PROFILE='!Statement' perl dbi-profile-sqlite.pl DBI::Profile: 266.582027s 99.63% (1015 calls) dbi-profile-sqlite.pl @ 2006-03-22↲ 17:19:51 '' => 0.004483s / 10 = 0.000448s avg (first 0.000007s, min 0.000003s, max 0.004329s) 'CREATE TABLE names ( id INTEGER, name CHAR(64) )' => 0.413145s 'DROP TABLE names' => 0.294514s 'INSERT INTO names VALUES ( ?, ? )' => 265.869885s / 1003 = 0.265075s avg (first 0.000306s, min 0.000016s,↲ max 0.771342s)
But this is a well-known issue with SQLite and some other
databases because they automatically commit each query and wait for the
data to make it to the physical disk before moving on. Instead of
inserting every row individually, I can do that in a transaction. I
don’t have to actually write to the database for every INSERT
. I’ll do it all at once when I COMMIT
:
# dbi-profile-sqlite-transaction.pl $dbh->do( "BEGIN TRANSACTION" ); foreach my $index ( 0 .. $#array ) { $insert->execute( $index, $array[$index] ); } $dbh->do( "COMMIT" );
Now the profile looks much different. Looking at the results, I can see that I’ve improved the insertion time by orders of magnitude, and now it’s faster, by all measures, than any of the previous programs that did the same thing:
% DBI_PROFILE='!Statement' perl dbi-profile-sqlite2.pl DBI::Profile: 1.334367s 54.19% (1016 calls) dbi-profile-sqlite2.pl @ 2006-03-22 17:25:44 '' => 0.004087s / 9 = 0.000454s avg (first 0.000007s, min 0.000003s, max 0.003950s) 'BEGIN TRANSACTION' => 0.000257s 'COMMIT' => 0.255082s / 2 = 0.127541s avg (first 0.254737s, min 0.000345s, max 0.254737s) 'CREATE TABLE names ( id INTEGER, name CHAR(64) )' => 0.271928s 'DROP TABLE names' => 0.715443s 'INSERT INTO names VALUES ( ?, ? )' => 0.087570s / 1002 = 0.000087s avg (first 0.000317s, min 0.000004s, max 0.003396s)
I started this chapter with Devel::SmallProf
only
because I get to the results faster. Devel::DProf
does
much of the same thing, but stores the results in its own format so it can
do several things with them later, such as make pretty code graphs. I call
it in the same way by using the -d
switch.
I have a program that reads the Use.Perl[25]journal entries through its SOAP interface. I run it with the
-d
switch and tell it to use
Devel::DProf
as the debugging module:
% perl -d:DProf journals
Once I’ve run the program, I have a new file named
tmon.out, although I can change that with the
PERL_DPROF_OUT_FILE_NAME
environment
variable. This file isn’t human-readable, so I need to use
to turn it into something that I can use. The wallclock time
is 53 seconds, although I spent less than a second in the CPU. I could
improve parts of the program, but the network latency and download times
will still dominate the time:dprofpp
$ dprofpp LWP::Protocol::collect has 17 unstacked calls in outer Compress::Zlib::__ANON__ has 5 unstacked calls in outer ...snipped several more lines like these... HTTP::Message::__ANON__ has 8 unstacked calls in outer Total Elapsed Time = 53.08383 Seconds User+System Time = 0.943839 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 8.37 0.079 0.000 84 0.0009 0.0000 utf8::SWASHNEW 6.25 0.059 0.146 5 0.0118 0.0292 main::BEGIN 5.83 0.055 0.073 24 0.0023 0.0030 Text::Reform::form 5.09 0.048 0.067 2 0.0242 0.0334 HTTP::Cookies::Netscape::load 4.24 0.040 0.040 10 0.0040 0.0040 LWP::UserAgent::BEGIN 4.24 0.040 0.049 9 0.0044 0.0054 Text::Autoformat::BEGIN 3.71 0.035 0.035 697 0.0001 0.0001 HTTP::Headers::_header 3.18 0.030 0.030 8 0.0037 0.0037 DynaLoader::dl_load_file 3.18 0.030 0.079 2 0.0149 0.0393 Text::Template::GEN0::BEGIN 3.18 0.030 0.068 17 0.0017 0.0040 LWP::Protocol::implementor 2.65 0.025 0.045 221 0.0001 0.0002 SOAP::SOM::_traverse_tree 2.54 0.024 0.024 892 0.0000 0.0000 HTTP::Cookies::set_cookie 2.44 0.023 0.023 1060 0.0000 0.0000 Text::Reform::_debug 2.44 0.023 0.061 293 0.0001 0.0002 SOAP::SOM::_traverse 2.12 0.020 0.030 5 0.0040 0.0059 HTTP::Cookies::BEGIN
If I don’t want to sort on execution time,
offers other options. With
dprofpp
-l
(lowercase L), I get the output in
order of most frequently called subroutine:
$ dprofpp -l LWP::Protocol::collect has 17 unstacked calls in outer Compress::Zlib::__ANON__ has 5 unstacked calls in outer ... snip ... HTTP::Message::__ANON__ has 8 unstacked calls in outer Total Elapsed Time = 53.08383 Seconds User+System Time = 0.943839 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 1.06 0.010 0.010 1525 0.0000 0.0000 UNIVERSAL::isa 0.21 0.002 0.002 1184 0.0000 0.0000 LWP::Debug::debug - - -0.009 1156 - - SOAP::Data::new 2.44 0.023 0.023 1060 0.0000 0.0000 Text::Reform::_debug 2.54 0.024 0.024 892 0.0000 0.0000 HTTP::Cookies::set_cookie - - -0.005 753 - - SOAP::Serializer::new - - -0.014 734 - - SOAP::Serializer::__ANON__ 3.71 0.035 0.035 697 0.0001 0.0001 HTTP::Headers::_header - - 0.000 527 - 0.0000 HTTP::Message::__ANON__ 0.74 0.007 0.007 439 0.0000 0.0000 UNIVERSAL::can 0.64 0.006 0.004 425 0.0000 0.0000 HTTP::Headers::__ANON__ - - -0.002 382 - - SOAP::Utils::o_lattr - - -0.002 369 - - SOAP::Trace::__ANON__ - - -0.002 323 - - HTTP::Message::_elem 0.64 0.006 0.024 323 0.0000 0.0001 HTTP::Headers::push_header
With
’s
dprofpp
-T
switch, I can see the chain of subroutine calls. My journals
program made about 25,000 subroutine
calls in all (it didn’t really seem that complicated when I wrote it), and
here’s a short section of the output where I see the
SOAP
module doing its work:
$ dprofpp -T ... snip ... SOAP::Serializer::encode_object SOAP::Serializer::multiref_object SOAP::Serializer::gen_id SOAP::Serializer::__ANON__ SOAP::Serializer::new UNIVERSAL::isa UNIVERSAL::isa SOAP::Serializer::encode_scalar SOAP::XMLSchema::Serializer::xmlschemaclass SOAP::Serializer::maptypetouri SOAP::Serializer::encode_object SOAP::Serializer::multiref_object SOAP::Serializer::gen_id SOAP::Serializer::__ANON__ SOAP::Serializer::new ... snip ...
I should also note, however, the Devel::DProf
sometimes runs into problems and produces segmentation faults. In that
case, I can use Devel::Profiler
, which is a pure Perl
replacement for Devel::DProf
, although slower.
The Devel::SmallProf
module from the first examples isn’t really all that complicated.
When I look under the hood, I don’t see that much code. It’s very easy, in
fact, to write my own profiler.[26]
I’m going to create a profiler to simply count the number of times
Perl sees a certain line of code during the runtime. The
Devel::SmallProf
module already does that for me, but
once I know how to program the basics myself, I can adapt it to just
about anything that I want to do.
When I run my program with the -d
switch, for each statement Perl calls the special subroutine
&DB::DB
(the default Perl
debugger is just a program and works in much the same way). That’s the
subroutine named DB
in the package
DB
. I can do whatever I like in that
subroutine, so I’m going to accumulate that count of the number of times
I’ve seen that line of code:
package Devel::LineCounter; package DB; use strict; use warnings; my @counter = (); sub DB { my( $file, $line ) = ( caller )[1,2]; next unless $file eq $0; $counter[$line]++; }
To get profiling output without changing my original program, I
add an END
block to my LineCounter
module. That will execute
somewhere near the very end of the program, although it might execute
before other END
blocks:
END { print "\nLine summary for $0\n\n"; open FILE, $0 or die "Could not open $0\n$!"; while( <FILE> ) { printf "%6d %s", $counter[++$count] || 0, $_; } }
I store my new module in the right place (i.e., Devel/LineCounter.pm somewhere in Perl’s module search path), and then invoke it with the -d switch:
% perl -d:LineCounter factorial.pl
The Devel::Cover
module profiles my test suites
to tell me how much of the code base they actually test. It counts the
number of times the test suite runs a line of code, as well as keeping
track of which code branches I follow. Ideally, my test should touch every
line of code and exercise every possible set of conditions.
Devel::Cover
comes with the cover
command, which reports the coverage statistics for code. To use it, I
first clear anything it might have done before. I don’t really need to
clear the coverage data. I might want to add the current run data to
what I’ve already done, or to other coverage databases for other parts
of the project:
$ cover -delete
Once I’ve cleared the coverage database, I run my program after
loading Devel::Cover
. The conventional make test
invocation uses
Test::Harness
to run each test program, so I tell
Test::Harness
to load Devel::Cover
by setting HARNESS_PERL_SWITCHES
with
additional information for the command line to call each test
program:
$ HARNESS_PERL_SWITCHES=-MDevel::Cover make test
If I’m using Module::Build
instead of ExtUtils::MakeMaker
, I don’t
have to do so much work:
$ ./Build testcover
Just as with the other Devel::
modules,
Devel::Cover
watches as the code runs and uses a lot
of black magic to decide what’s going on. It stores all of this
information in a directory named cover_db
.
Finally, the
command turns all of that data
into something that I can understand, writing a summary to STDOUT and
creating a detailed report in coverage.html. The HTML file links to several
other HTML files, allowing me to drill down into the program. Here’s a
run that analyzes my cover
HTTP::Size
module:
$ cover Reading database from /Users/brian/Dev/HTTP/Size/cover_db ---------------------------- ------ ------ ------ ------ ------ ------ ------ File stmt branch cond sub pod time total ---------------------------- ------ ------ ------ ------ ------ ------ ------ blib/lib/HTTP/Size.pm 95.5 70.0 69.2 100.0 100.0 100.0 88.4 Total 95.5 70.0 69.2 100.0 100.0 100.0 88.4 ---------------------------- ------ ------ ------ ------ ------ ------ ------
The summary shows me that in my test suite I’ve executed 95.5 percent of the statements in the module. Along with that, I’ve only tested 70 percent of the possible execution paths and 69.2 percent of the possible combinations of conditions. That’s just the summary, though. The HTML report in coverage.html (Figure 5-1) tells me much more.
The HTML report has columns for each of the types of coverage that it measures, and it colors the table cells green to tell me that I have 100 percent coverage for that measurement on that line of code, and red to tell me that I have more testing work to do (Figure 5-2).
Before I decide how to improve my Perl program, I need to profile it to determine which sections need the most work. Perl profilers are just specialized debuggers, and if I don’t like what’s already out there, I can make my own profiler.
The perldebguts documentation describes creating a custom debugger. I write more about those in my articles for The Perl Journal, “Creating a Perl Debugger” (http://www.ddj.com/184404522) and “Profiling in Perl” (http://www.ddj.com/184404580).
“The Perl Profiler” is Chapter 20 of Programming Perl, Third Edition, by Larry Wall, Tom Christiansen, and Jon Orwant. Anyone on the road to Perl mastery should already have this book.
Perl.com has two interesting articles on profiling: “Profiling Perl” by Simon Cozens (http://www.perl.com/lpt/a/850) and “Debugging and Profiling mod_perl Applications” by Frank Wiles (http://www.perl.com/pub/a/2006/02/09/debug_mod_perl.html).
Randal L. Schwartz writes about profiling in “Speeding up Your Perl Programs” for Unix Review (http://www.stonehenge.com/merlyn/UnixReview/col49.html) and “Profiling in Template Toolkit via Overriding” for Linux Magazine (http://www.stonehenge.com/merlyn/LinuxMag/col75.html).
[23] Alternately, I can set $dbh->{Profile}
from within my
program.
[24] Sam also wrote DBI::ProfileDumper::Apache
for use under mod_perl.
[25] Use.Perl is run by Chris Nandor (http://use.perl.org).
[26] Some of this same discussion appeared in my online Dr. Dobb’s article, “Creating a Perl Debugger” (http://www.ddj.com/documents/s=1498/ddj0103bpl/).
Get Mastering Perl now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.