Perl lets me hook into its variables through a mechanism it calls tying. I can change how things happen when I access and store values, or just about anything else I do with a variable.
Tied variables go back to the basics. I can decide what Perl will do when I store or fetch values from a variable. Behind the scenes, I have to implement the logic for all of the variable’s behavior. Since I can do that, I can make what look like normal variables do anything that I can program (and that’s quite a bit). Although I might use a lot of magic on the inside, at the user level, tied variables look like the familiar variables. Not only that, tied variables work throughout the Perl API. Even Perl’s internal workings with the variable use the tied behavior.
You probably already have seen tied variables in action, even without using tie
. The dbmopen
command ties a
hash to a database file:
dbmopen %DBHASH, "some_file", 0644;
That’s old school Perl, though. Since then, the numbers and types of
these on-disk hashes proliferated and improved. Each implementation solves
some problem in another one. If I want to use one of those instead of the
implementation Perl wants to use with dbmopen
, I use tie
to associate my hash with the right
module:
tie %DBHASH, 'SDBM_File', $filename, $flags, $mode;
There’s some hidden magic here. The programmer sees the %DBHASH
variable, which
acts just like a normal hash. To make it work out, though, Perl maintains
a “secret object” that it associates with the variable (%DBHASH
). I can actually get this object as the
return value of tie
:
my $secret_obj = tie %DBHASH, 'SDBM_File', $filename, $flags, $mode;
If I forgot to get the secret object when I called tie
, I can get it later using tied
. Either way,
I end up with the normal-looking variable and the object, and I can use
either one:
my $secret_obj = tied( %DBHASH );
Any time I do something with %DBHASH
, Perl will translate that action into a
method call to $secret_obj
. Each
variable type (scalar, arrays, and so on) has different behaviors, so they
have different methods, and that’s what I have to implement.
You might already use tied variables without knowing it. In Chapter 5 of Intermediate Perl, we talked about opening a filehandle to a scalar reference:
open my($fh), ">", \$print_to_this_string or die "Could not open string: $!";
Once I have the filehandle, I do filehandle sorts of things with it.
How does it perform the magic? The IO::Scalar
module
implements a tied filehandle and takes responsibility for the filehandle
behavior. It can do whatever it likes, in this case printing to a scalar
instead of a file.
Back in the day when I did a lot of HTML coding, I liked to alternate the color of table rows. This isn’t a difficult thing to do, but it is annoying. Somewhere I have to store a list of colors to use, then I have to select the next color in the list each time I create a row:
@colors = qw( AAAAAA CCCCCC EEEEEE ); my $row = 0; foreach my $item ( @items ) { my $color = $colors[ $row++ % ($#colors + 1) ]; print qq|<tr><td bgcolor="$color">$item</td></tr>|; }
That extra couple of lines is really annoying. I especially don’t
like declaring the $row
variable
outside of the foreach
loop. It’s not
really a problem, but aesthetically, I don’t think it looks nice. Why
should I have to deal with the mechanics of selecting a color when my loop
is simply about creating a table row?
I created the Tie::Cycle
module to fix this. Instead of using an array, I create special
behavior for a scalar: every time I access the special scalar, I get back
the next color in the list. The tie magic handles all of the other stuff
for me. As a side benefit, I don’t have to debug those off-by-one errors I
tend to get when I try to recode this operation every time I need
it:
use Tie::Cycle; tie my $color, 'Tie::Cycle', [ qw( AAAAAA CCCCCC EEEEEE ) ]; foreach my $item ( @items ) { print qq|<tr><td bgcolor="$color">$item</td></tr>|; }
I can even reuse my tied $color
variable. No matter where I stop in the cycle, I can reset it to the
beginning if I’d like to start every group of rows with the same color. I
get the secret object with tied
and
then call the reset
method I provided
when I created the module:
tied( $color )->reset; foreach my $item ( @other_items ) { print qq|<tr><td bgcolor="$color">$item</td></tr>|; }
With Tie::Cycle
, I give an array a scalar
interface, but I don’t have to do something that tricky. I use the usual
interface and simply place restrictions on the storage or access of the
data type. I’ll show that in a moment.
Behind the scenes Perl uses an object for the tied variable. Although the user doesn’t treat the tied variable like an object, Perl figures out which methods to call and does the right thing.
At the programmer level, once I take responsibility for the
variable’s behavior, I have to tell it how to do everything. The tie
mechanism uses special method names, which
it expects me to implement. Since each variable type acts a bit
differently (I can unshift
onto an
array but not a scalar, and I can get the keys of a hash but not an
array), each type has its additional special tie
methods that apply only to it.
Perl 5.8 comes with base classes to help me get started. I
can use Tie::Scalar
,
Tie::Array
, Tie::Hash
, or
Tie::Handle
as a starting point for my own Tie::*
modules. I usually find that once I
decide to do something really special, I don’t get much use out of
those.
Each variable type will have a constructor, named by prefixing
TIE
to its type name (TIESCALAR
, and so on), and optional UNTIE
and DESTROY
methods. After that, each variable type
has methods specific to its behavior.
Perl calls the constructor when I use tie
. Here’s my earlier example again:
tie my $color, 'Tie::Cycle', [ qw( AAAAAA CCCCCC EEEEEE ) ];
Perl takes the class name, Tie::Cycle
, and calls
the class method, TIESCALAR
, giving
it the rest of the arguments to tie
:
my $secret_object = Tie::Cycle->TIESCALAR( [ qw( AAAAAA CCCCCC EEEEEE ) ] );
After it gets the secret object, it associates it with the variable
$color
.
When $color
goes out of scope,
Perl translates that into another method call on the secret object,
calling its DESTROY
method:
$secret_object->DESTROY;
Or I can decide that I don’t want my variable to be tied anymore. By
calling untie
, I break the association
between the secret object and the variable. Now $color
is just a normal scalar:
untie $color;
Perl translates that into the call to UNTIE
, which breaks the association between the
secret object and the variable:
$secret_object->UNTIE;
Tied scalars are the easiest to implement since scalars don’t do too
much. I can either store or access scalar data. For my special scalar
behavior, I have to create two methods: STORE
, which Perl calls when I assign a value,
and FETCH
, which Perl calls when I
access the value. Along with those, I provide TIESCALAR
, which Perl calls when I use tie
, and possibly the DESTROY
or UNTIE
methods.
The TIESCALAR
method works like
any other constructor. It gets the class name as its first argument, then
a list of the remaining arguments. Those come directly from tie
.
In my Tie::Cycle
example, everything after the variable name that I’m tying
(that is, the class name and the remaining arguments) ends up as the
arguments to TIESCALAR
. Other than
the method name, this looks like a normal constructor. Perl handles all
the tying for me, so I don’t have to do that myself:
tie $colors, 'Tie::Cycle', [ qw( AAAAAA CCCCCC EEEEEE ) ];
That’s almost the same as calling TIESCALAR
myself:
my $object = Tie::Cycle->TIESCALAR( [ qw( AAAAAA CCCCCC EEEEEE ) ] );
However, since I didn’t use tie, all I get is the object, and Perl doesn’t know anything about the special interface. It’s just a normal object.
In Tie::Cycle
(available on CPAN), the start of
the module is quite simple. I have to declare the package name, set up
the usual module bits, and define my TIESCALAR
. I chose to set up the interface to
take two arguments: the class name and an anonymous array. There isn’t
anything special in that choice. TIESCALAR
is going to get all of the arguments
from tie
and it’s up to me to figure
out how to deal with them, including how to enforce the
interface.
In this example, I’m simple-minded: I ensure that I have an array reference and that it has more than one argument. Like any other constructor, I return a blessed reference. Even though I’m tying a scalar, I use an anonymous array as my object. Perl doesn’t care what I do as long as I’m consistent. On the outside all of this still looks like a scalar:
package Tie::Cycle; use strict; use vars qw( $VERSION ); $VERSION = sprintf "%d.%02d", q$Revision: 1.9 $ =~ m/ (\d+) \. (\d+) /xg; sub TIESCALAR { my $class = shift; my $list_ref = shift; my @shallow_copy = map { $_ } @$list_ref; return unless ref $list_ref eq ref []; my $self = [ 0, scalar @shallow_copy, \@shallow_copy ]; bless $self, $class; }
Once I have my tied variable, I use it just like I would any other variable of that type. I use my tied scalar just like any other scalar. I already stored an anonymous array in the object, but if I wanted to change that, I simply assign to the scalar. In this case, I have to assign an anonymous array:
$colors = [ qw(FF0000 00FF00 0000FF) ];
Behind the curtain, Perl calls my STORE
method.
Again, I don’t get to choose this method name, and I have to handle
everything myself. I go through the same sequence I did for TIESCALAR
. There’s probably an opportunity for
refactoring here, but the cure might be worse than the disease for such
a small module):
sub STORE { my $self = shift; my $list_ref = shift; return unless ref $list_ref eq ref []; $self = [ 0, scalar @$list_ref, $list_ref ]; }
Every time I try to get the value of the scalar, Perl calls
FETCH
. As before, I have to do all of
the work to figure out how to return a value. I can do anything that I
like as long as I return a value. In Tie::Cycle
, I
have to figure out which index I need to access, then return that value.
I increment the index, figure out the index modulo the number of
elements in the array, and then return the right value:
sub FETCH { my $self = shift; my $index = $self->[0]++; $self->[0] %= $self->[1]; return $self->[2]->[ $index ]; }
That’s all I have to do. I could create an
UNTIE
(or DESTROY
) method, but I
didn’t create any messes I have to clean up so I don’t do that for
Tie::Cycle
. There isn’t any additional magic for
those. Everything that you already know about DESTROY
works the same here.
If you look in the actual Tie::Cycle
source,
you’ll find additional methods. I can’t get to these through the
tie
interface, but with an the object
form I can. They aren’t part of the tie
magic, but since it’s really just an
object I can do object-oriented sorts of things, including adding
methods. For example, the previous
method gets me the previous value from the list without affecting the
current index. I can peek without changing anything:
my $previous = tied( $colors )->previous;
The tied
gets me the secret
object and I immediately call a method on it instead of storing it in a
variable. I can do the same thing, using next
to peek at the next element:
my $next = tied( $colors )->next;
And, as I showed earlier, I can reset the cycle:
tied( $colors )->reset;
I’ll create a tied scalar that sets an upper bound on the
magnitude of the integer, meaning that there is
some range around zero that I can store in the variable. To create the
class to implement the tie
, I do the
same thing I had to do for Tie::Cycle
: create
TIESCALAR
, STORE
, and FETCH
routines:
package Tie::BoundedInteger; use strict; use Carp qw(croak); use vars qw( $VERSION ); $VERSION = 1.0; sub TIESCALAR { my $class = shift; my $value = shift; my $max = shift; my $self = bless [ 0, $max ], $class; $self->STORE( $value ); return $self; } sub FETCH { $_[0]->[0] } sub STORE { my $self = shift; my $value = shift; my $magnitude = abs $value; croak( "The [$value] exceeds the allowed limit [$self->[1]]" ) if( int($value) != $value || $magnitude > $self->[1] ); $self->[0] = $value; $value; } 1;
At the user level, I do the same thing I did before. I call
tie
with the variable name, the class
that implements the behavior, and finally the arguments. In this
program, I want to start off with the value 1
, and set the magnitude limit to 3
. Once I do that, I’ll try to assign $number
each of the integer values between
-5
and 5
and then print what happened:
#!/usr/bin/perl use Tie::BoundedInteger; tie my $number, 'Tie::BoundedInteger', 1, 3; foreach my $try ( -5 .. 5 ) { my $value = eval { $number = $try }; print "Tried to assign [$try], "; print "but it didn't work, " unless $number == $try; print "value is now [$number]\n"; }
From the output I can see that I start off with the value 1
in $number
, but when I try to assign 7
(a value with a magnitude greater than
5
), it doesn’t work and the value is
still 1
. Normally my program would
croak
right there, but I used an
eval
to catch that error. The same
thing happens for 6
. When I try
5
, it works:
Tried to assign [-5], but it didn't work, value is now [1] Tried to assign [-4], but it didn't work, value is now [1] Tried to assign [-3], value is now [-3] Tried to assign [-2], value is now [-2] Tried to assign [-1], value is now [-1] Tried to assign [0], value is now [0] Tried to assign [1], value is now [1] Tried to assign [2], value is now [2] Tried to assign [3], value is now [3] Tried to assign [4], but it didn't work, value is now [3] Tried to assign [5], but it didn't work, value is now [3]
My Tie::BoundedInteger
example changed how I could store values by limiting their
values. I can also change how I fetch the values. In this example, I’ll
create Tie::Timely
, which sets a lifetime on the
value. After that lifetime expires, I’ll get undef
when I access the value.
The STORE
method is easy. I
just store whatever value I get. I don’t care if it’s a simple scalar, a
reference, an object, or anything else. Every time I store a value,
though, I’ll record the current time too. That way every time I change
the value I reset the countdown.
In the FETCH
routine, I have
two things I can return. If I’m within the lifetime of the value, I
return the value. If I’m not, I return nothing at all:
package Tie::Timely; use strict; use Carp qw(croak); use vars qw( $VERSION ); $VERSION = 1.0; sub TIESCALAR { my $class = shift; my $value = shift; my $lifetime = shift; my $self = bless [ undef, $lifetime, time ], $class; $self->STORE( $value ); return $self; } sub FETCH { time - $_[0]->[2] > $_[0]->[1] ? () : $_[0]->[0] } sub STORE { @{ $_[0] }[0,2] = ( $_[1], time ) } 1;
I set up tied arrays just like I do tied scalars, but I have extra methods to create since I can do more with arrays. My implementation has to handle the array operators (shift, unshift, push, pop, splice) as well as the other array operations we often take for granted:
Getting or setting the last array index
Extending the array
Checking that an index exists
Deleting a element
Clearing all the values
Once I decide that I want to implement my own array behavior, I own
all of those things. I don’t really have to define methods for each of
those operations, but some things won’t work unless I do. The
Tie::Array
module exists as a bare-bones base class
that implements most of these things, although only to croak
if a program tries to use something I
haven’t implemented. Table 17-1 shows how some array operations translate to tie methods (and
perltie has the rest). Most of the methods have the
same name as the Perl operator, although in all caps.
Table 17-1. The mapping of selected array actions to tie methods
Action | Array operation | Tie method |
---|---|---|
Set value |
|
|
Get value |
|
|
Array length |
|
|
Pre-extend |
|
|
Add to end |
|
|
Remove from end |
|
|
When I talked about tying scalars, I showed my
Tie::Cycle
module, which treated an array like a
scalar. To be fair, I should go the other way by treating a scalar as an
array. Instead of storing several array elements, each of which incurs
all of the overhead of a scalar variable, I’ll create one scalar and
chop it up as necessary to get the array values. Essentially, my example
trades memory space for speed. I’ll reuse my bounded integer example
since I can make a number less than 256 fit into a single character.
That’s convenient, isn’t it?
package Tie::StringArray; use strict; use Carp qw(croak); use vars qw( $VERSION ); $VERSION = 1.0; sub _null { "\x00" } sub _last () { $_[0]->FETCHSIZE - 1 } sub _normalize_index { $_[1] == abs $_[1] ? $_[1] : $_[0]->_last + 1 - abs $_[1] } sub _store { chr $_[1] } sub _show { ord $_[1] } sub _string { ${ $_[0] } } sub TIEARRAY { my( $class, @values ) = @_; my $string = ''; my $self = bless \$string, $class; my $index = 0; $self->STORE( $index++, $_ ) foreach ( @values ); $self; } sub FETCH { my $index = $_[0]->_normalize_index( $_[1] ); $index > $_[0]->_last ? () : $_[0]->_show( substr( $_[0]->_string, $index, 1 ) ); } sub FETCHSIZE { length $_[0]->_string } sub STORESIZE { my $self = shift; my $new_size = shift; my $size = $self->FETCHSIZE; if( $size > $new_size ) # truncate { $$self = substr( $$self, 0, $size ); } elsif( $size < $new_size ) # extend { $$self .= join '', ($self->_null) x ( $new_size - $size ); } } sub STORE { my $self = shift; my $index = shift; my $value = shift; croak( "The magnitude of [$value] exceeds the allowed limit [255]" ) if( int($value) != $value || $value > 255 ); $self->_extend( $index ) if $index >= $self->_last; substr( $$self, $index, 1, chr $value ); $value; } sub _extend { my $self = shift; my $index = shift; $self->STORE( 0, 1 + $self->_last ) while( $self->_last >= $index ); } sub EXISTS { $_[0]->_last >= $_[1] ? 1 : 0 } sub CLEAR { ${ $_[0] } = '' } sub SHIFT { $_[0]->_show( substr ${ $_[0] }, 0, 1, '' ) } sub POP { $_[0]->_show( chop ${ $_[0] } ) } sub UNSHIFT { my $self = shift; foreach ( reverse @_ ) { substr ${ $self }, 0, 0, $self->_store( $_ ) } } sub PUSH { my $self = shift; $self->STORE( 1 + $self->_last, $_ ) foreach ( @_ ) } sub SPLICE { my $self = shift; my $arg_count = @_; my( $offset, $length, @list ) = @_; if( 0 == $arg_count ) { ( 0, $self->_last ) } elsif( 1 == $arg_count ) { ( $self->_normalize_index( $offset ), $self->_last ) } elsif( 2 <= $arg_count ) # offset and length only { ( $self->_normalize_index( $offset ), do { if( $length < 0 ) { $self->_last - $length } else { $start + $length - 1 } } ) } #@removed = map { $self->POP } $start .. $end; if( wantarray ) { @removed; } else { defined $removed[-1] ? $removed[-1] : undef; } } 1;
To make this work, I’ll treat each position in my string as an
array element. To store a value, in STORE
the arguments are the index for the
value and the value itself. I need to convert the value to a character
and put that character in the right position in the string. If I try to
store something other than a whole number between 1 and 255, I get an
error.
To fetch a value I need to extract the character from the correct
position in the string and convert it to a number. The argument to
FETCH
is the index of the element so
I need to convert that to something I can use with substr
.
Now, for the more complex array operations, I have to do a bit
more work. To retrieve a splice, I have to grab several values, but
splice
is also an value so I have to
be ready to assign those positions more values. Not only that, a user
might assign fewer or more values than the splice
extracts, so I have to be ready to
shrink or expand the string. That’s not scary, though, since I can
already do all of that with a string by using substr
.
Deleting an element is a bit trickier. In a normal array I can
have an undefined element. How am I going to handle that in the middle
of a string? Amazingly, my example left me a way to handle this: I can
store a undef
as a null byte. If I had to store
numbers between 0 and 255, I would have been in trouble. Curious how
that works out.
Perl also lets me extend a tied array. In a normal array, I can extend an array to let Perl know I want it to do the work to make a certain number of elements available (thus explicitly circumventing Perl’s built-in logic to make its best guess about the proper array length). In this example, I just need to extend the string.
I contrived that last example so I could show the whole process
without doing anything too tricky. I might want to store an array of
characters, and that example would work quite well for that. Now I want
to adapt it to store a DNA sequence. My domain changes from 256 things
to something much smaller, the set { T C G A }
, which
represents thymine, cytosine, guanine, and adenine. If I add in the
possibility of a NULL
(maybe my gene sequencer can’t
tell what should be in a particular position), I have six possibilities.
I don’t need an entire character for that. I can actually get by with
three bits and have a little to spare.
Before I get too deeply into this, let me make a guess about how much memory this can save. A typical DNA sequence has several thousand base pairs. If I used an array for that, I’d have the scalar overhead for each one of those. I’ll say that’s 10 bytes, just to be kind. For 10,000 base pairs, which is just a small sequence, that’s 100,000 bytes. That scalar overhead really starts to add up! Now, instead of that, I’ll store everything in a single scalar. I’ll incur the scalar overhead once. For 10,000 base pairs at three bits a pair, that’s 30,000 bits, or 3,750 bytes. I round that off to 4,000 bytes. That’s a factor of 50! Remember, this memory parsimony comes at the expense of speed. I’ll have to do a little bit more computational work.
With six bits I have eight distinct patterns. I need to assign some of those patterns meanings. Fortunately for me, Perl makes this really easy since I can type out binary strings directly as long as I’m using Perl 5.6 or later (see Chapter 16 for more on bit operations):
use constant N => 0b000; use constant T => 0b001; use constant C => 0b100; use constant G => 0b110; use constant A => 0b011; use constant RESERVED1 => 0b111; use constant RESERVED2 => 0b101;
Also, since I’m not using characters anymore, I can’t use substr
. For vec
, I’d have to partition the bits by powers
of two, but I’d have to waste another bit for that (and I’m already
wasting two).[60]If I do that, I end up with 10 unused patterns. That might
be nice if we eventually meet aliens with more complex hereditary
encodings, but for now I’ll just stick with what we have.
Before you get scared off by this code, remember what I’m doing. It’s exactly the same problem as the last example where I stored digits as characters in a long string. This time I’m doing it at the bit level with a bit more math. My specific example doesn’t matter as much as the concept that I can make anything, and I mean anything, look like an array if I’m willing to do all the work:
package Tie::Array::DNA; use strict; use base qw(Tie::Array); use Carp qw(croak carp); use vars qw( $VERSION ); $VERSION = 1.0; use constant BITS_PER_ELEMENT => 3; use constant BIT_PERIOD => 24; # 24 bits use constant BYTE_LENGTH => 8; use constant BYTE_PERIOD => 3; # 24 bits my %Patterns = ( T => 0b001, A => 0b011, C => 0b100, G => 0b110, N => 0b000, ); my @Values = (); foreach my $key ( keys %Patterns ) { $Values[ $Patterns{$key} ] = $key } sub _normalize { uc $_[1] } sub _allowed { length $_[1] eq 1 and $_[1] =~ tr/TCGAN// } my %Last; sub TIEARRAY { my( $class, @values ) = @_; my $string = \''; my $self = bless $string, $class; $$self = "\x00" x 10_000; $Last{ "foo" } = -1; my $index = 0; $self->STORE( $index++, $_ ) foreach ( @values ); $self; } sub _get_start_and_length { my( $self, $index ) = @_; my $bytes_to_start = int( $index * BITS_PER_ELEMENT / BYTE_LENGTH ); my $byte_group = int( $bytes_to_start / BYTE_PERIOD ); my $start = $byte_group * BYTE_PERIOD; ( $start, BYTE_PERIOD ) } sub _get_bytes { my( $self, $index ) = @_; my( $start, $length ) = $self->_get_start_and_length( $index ); my @chars = split //, substr( $$self, $start, $length ); (ord( $chars[0] ) << 16) + (ord( $chars[1] ) << 8) + ord( $chars[2] ); } sub _save_bytes { my( $self, $index, $bytes ) = @_; my( $start, $length ) = $self->_get_start_and_length( $index ); my $new_string = join '', map { chr( ( $bytes & ( 0xFF << $_ ) ) >> $_ ) } qw( 16 8 0 ); substr( $$self, $start, $length, $new_string ); } sub _get_shift { BIT_PERIOD - BITS_PER_ELEMENT - ($_[1] * BITS_PER_ELEMENT % BIT_PERIOD); } sub _get_clearing_mask { ~ ( 0b111 << $_[0]->_get_shift( $_[1] ) ) } sub _get_setting_mask { $_[0]->_get_pattern_by_value( $_[2] ) << $_[0]->_get_shift( $_[1] ) } sub _get_selecting_mask { 0b111 << $_[0]->_get_shift( $_[1] ) } sub _get_pattern_by_value { $Patterns{ $_[1] } } sub _get_null_pattern { $Patterns{ 'N' } } sub _get_value_by_pattern { $Values [ $_[1] ] } sub _string { $_[0] } sub _length { length ${$_[0]} } sub _add_to_string { ${$_[0]} .= $_[1] } sub STORE { my( $self, $index, $value ) = @_; $value = $self->_normalize( $value ); carp( qq|Cannot store unallowed element "$value"| ) unless $self->_allowed( $value ); $self->_extend( $index ) if $index > $self->_last; # get the mask my $clear_mask = $self->_get_clearing_mask( $index ); my $set_mask = $self->_get_setting_mask( $index, $value ); # clear the area my $result = ( $self->_get_bytes( $index ) & $clear_mask ) | $set_mask; # save the string my( $start, $length ) = $self->_get_start_and_length( $index ); my $new_string = join '', map { chr( ( $result & ( 0xFF << $_ ) ) >> $_ ) } qw( 16 8 0 ); substr( $$self, $start, $length, $new_string ); $self->_set_last( $index ) if $index > $self->_last; $value } sub FETCH { my( $self, $index ) = @_; # get the right substr my $bytes = $self->_get_bytes( $index ); # get the mask my $select_mask = $self->_get_selecting_mask( $index ); my $shift = $self->_get_shift( $index ); # clear the area my $pattern = 0 + ( ( $bytes & $select_mask ) >> $shift ); $self->_get_value_by_pattern( $pattern ); } sub FETCHSIZE { $_[0]->_last + 1 } sub STORESIZE { $_[0]->_set_last( $_[1] ) } sub EXTEND { } sub CLEAR { ${ $_[0] } = '' } sub EXISTS { $_[1] < $Last{ "foo" } } sub DESTROY { } __PACKAGE__;
This code gets a bit complicated because I have to implement my own array. Since I’m storing everything in a single string and using the string as a long string of bits instead of characters, I have to come up with a way to get the information that I need.
I’m using three bits per element and characters come with eight
bits. To make everything simpler, I decide to deal with everything in
3-byte (24-bit) chunks because that’s the lowest common denominator
between 3-bit and 8-bit chunks of data. I do that in _get_bytes
and _save_bytes
, which figure out which three
characters they need to grab. The _get_bytes
method turns the three characters
into a single number so I can later use bit operations on it, and the
_save_bytes
method goes the other
way.
Once I have the number, I need to know how to pull out the three
bits. There are eight elements in each group, so _get_selecting_mask
figures out which of those
elements I want and returns the right bit mask to select it. That bit
mask is just 0b111
shifted up the
right number of places. The _get_shift
method handles that in general by
using the constants BIT_PERIOD
and
BITS_PER_ELEMENT
.
Once I got all of that in place, my FETCH
method can use it to return an element.
It gets the bit pattern then looks up that pattern with _get_value_by_pattern
to turn the bits into
the symbolic version (i.e., T, A, C, G ).
The STORE
method does all that
but the other way around. It turns the symbols into the bit pattern,
shifts that up the right amount, and does the right bit operations to
set the value. I ensure that I clear the target bits first using the
mask, I get back from _get_clearing_mask
. Once I clear the target
bits, I can use the bit mask from _get_setting_mask
to finally store the
element.
Whew! Did you make it this far? I haven’t even implemented all of
the array features. How am I going to implement SHIFT
, UNSHIFT
, or SPLICE
? Here’s a hint: remember that Perl has
to do this for real arrays and strings. Instead of moving things over
every time I affect the front of the data, it keeps track of where it
should start, which might not be the beginning of the data. If I wanted
to shift off a single element, I just have to add that offset of three
bits to all of my computations. The first element would be at bits 3 to
5 instead of 0 to 2. I’ll leave that up to you, though.
Tied hashes are only a bit more complicated than tied arrays, but like all tied variables, I set them up in the same way. I need to implement methods for all of the actions I want my tied hash to handle. Table 17-2 shows some of the hash operations and their corresponding tied methods.
Table 17-2. The mapping of selected hash actions to tie methods
Action | Hash operation | Tie method |
---|---|---|
Set value |
|
|
Get value |
|
|
Delete a key |
|
|
Check for a key |
|
|
Next key |
|
|
Clear the hash |
|
|
One common task, at least for me, is to accumulate a count of something in a hash. One of my favorite examples to show in Perl courses is a word frequency counter. By the time students get to the third day of the Learning Perl course, they know enough to write a simple word counter:
my %hash = (); while( <> ) { chomp; my @words = split; foreach my $word ( @words ) { $hash{$word}++ } } foreach my $word ( sort { $hash{$b} <=> $hash{$a} } keys %hash ) { printf "%4d %-20s\n", $hash{$word}, $word; }
When students actually start to use this, they discover that it’s
really not as simple as all that. Words come in different capitalizations,
with different punctuation attached to them, and possibly even misspelled.
I could add a lot of code to that example to take care of all of those
edge cases, but I can also fix that up in the hash assignment itself. I
replace my hash declaration with a call to tie
and leave the rest of the program
alone:
# my %hash = (); # old way tie my( %hash ), 'Tie::Hash::WordCounter'; while( <> ) { chomp; my @words = split; foreach my $word ( @words ) { $hash{$word}++ } } foreach my $word ( sort { $hash{$b} <=> $hash{$a} } keys %hash ) { printf "%4d %-20s\n", $hash{$word}, $word; }
I can make a tied hash do anything that I like, so I can make it handle those edge cases by normalizing the words I give it when I do the hash assignment. My tiny word counter program doesn’t have to change that much and I can hide all the work behind the tie interface.
I’ll handle most of the complexity in the STORE
method. Everything else will act just like
a normal hash, and I’m going to use a hash behind the scenes. I should
also be able to access a key by ignoring the case and punctuation issues
so my FETCH
method normalizes its
argument in the same way:
package Tie::Hash::WordCounter; use strict; use Tie::Hash; use base qw(Tie::StdHash); use vars qw( $VERSION ); $VERSION = 1.0; sub TIEHASH { bless {}, $_[0] } sub _normalize { my( $self, $key ) = @_; $key =~ s/^\s+//; $key =~ s/\s+$//; $key = lc( $key ); $key =~ s/[\W_]//g; return $key } sub STORE { my( $self, $key, $value ) = @_; $key = $self->_normalize( $key ); $self->{ $key } = $value; } sub FETCH { my( $self, $key ) = @_; $key = $self->_normalize( $key ); $self->{ $key }; } __PACKAGE__;
By now you know what I’m going to say: tied filehandles are like all the other tied variables. Table 17-3 shows selected file operations and their corresponding tied methods. I simply need to provide the methods for the special behavior I want.
Table 17-3. The mapping of selected filehandle actions to tie methods
Action | File operation | Tie method |
---|---|---|
Print to a filehandle |
|
|
Read from a filehandle |
|
|
Close a filehandle |
|
|
For a small example, I create
Tie::File::Timestamp
, which appends a timestamp to each
line of output. Suppose I start with a program that already has several
print statements. I didn’t write this program, but my task is to add a
timestamp to each line:
# old program open LOG, ">>", "log.txt" or die "Could not open output.txt! $!"; print LOG "This is a line of output\n"; print LOG "This is some other line\n";
I could do a lot of searching and a lot of typing, or I could even
get my text editor to do most of the work for me. I’ll probably miss
something, and I’m always nervous about big changes. I can make a little
change by replacing the filehandle. Instead of open
, I’ll use tie
, leaving the rest of the program as it
is:
# new program #open LOG, ">>", "log.txt" or die "Could not open output.txt! $!"; tie *LOG, "Tie::File::Timestamp", "log.txt" or die "Could not open output.txt! $!"; print LOG "This is a line of output\n"; print LOG "This is some other line\n";
Now I have to make the magic work. It’s fairly simple since I only
have to deal with four methods. In TIEHANDLE
, I open the file. If I can’t do that,
I simply return, triggering the die
in
the program since tie
doesn’t return a
true value. Otherwise, I return the filehandle reference, which I’ve
blessed into my tied class. That’s the object I’ll get as the first
argument in the rest of the methods.
My output methods are simple. They’re simple wrappers around the
built-in print
and printf
. I use the tie object as the filehandle
reference (wrapping it in braces as Perl Best
Practices recommends to signal to other people that’s what I
mean to do). In PRINT
, I simply add a
couple of arguments to the rest of the stuff I pass to print
. The first additional argument is the
timestamp, and the second is a space character to make it all look nice. I
do a similar thing in PRINTF
, although
I add the extra text to the $format
argument:
package Tie::File::Timestamp; use strict; use vars qw($VERSION); use Carp qw(croak); $VERSION = 0.01; sub _timestamp { "[" . localtime() . "]" } sub TIEHANDLE { my $class = shift; my $file = shift; open my( $fh ), ">> $file" or return; bless $fh, $class; } sub PRINT { my( $self, @args ) = @_; print { $self } $self->_timestamp, " ", @args; } sub PRINTF { my( $self, $format, @args ) = @_; $format = $self->_timestamp . " " . $format; printf { $self } $format, @args; } sub CLOSE { close $_[0] } __PACKAGE__;
Tied filehandles have a glaring drawback, though: I can only do this with filehandles. Since Learning Perl, I’ve been telling you that bareword filehandles are the old way of doing things and that storing a filehandle reference in a scalar is the new and better way.
If I try to use a scalar variable, tie
looks for TIESCALAR
method, along with the other tied
scalar methods. It doesn’t look for PRINT
, PRINTF
, and all of the other input/output
methods I need. I can get around that with a little black magic that I
don’t recommend. I start with a glob reference, *FH
, which creates an entry in the symbol table.
I wrap a do
block around it to form a
scope and to get the return value (the last evaluated expression). Since I
only use the *FH
once, unless I turn
off warnings in that area, Perl will tell me that I’ve only used *FH
once. In the tie
, I have to dereference $fh
as a glob reference so tie
looks for TIEHANDLE
instead of TIESCALAR
. Look scary? Good. Don’t do
this!
my $fh = \do{ no warnings; local *FH }; my $object = tie *{$fh}, $class, $output_file;
I’ve showed you a lot of tricky code to reimplement Perl data types in Perl. The tie interface lets me do just about anything that I want, but I also then have to do all of the work to make the variables act like people expect them to act. With this power comes great responsibility and a lot of work.
For more examples, inspect the Tie
modules on CPAN. You can peek at the source
code to see what they do and steal ideas for your own.
Teodor Zlatanov writes about “Tied Variables” for IBM developerWorks, January 2003: http://www-128.ibm.com/developerworks/linux/library/l-cptied.html.
Phil Crow uses tied filehandles to implement some design patterns in Perl in “Perl Design Patterns” for Perl.com: http://www.perl.com/lpt/a/2003/06/13/design1.html.
Dave Cross writes about tied hashes in “Changing Hash Behaviour with tie” for Perl.com: http://www.perl.com/lpt/a/2001/09/04/tiedhash.html.
Abhijit Menon-Sen uses tied hashes to make fancy dictionaries in “How Hashes Really Work” for Perl.com: http://www.perl.com/lpt/a/2002/10/01/hashes.html.
Randal Schwartz discusses tie
in
“Fit to be tied (Parts 1 & 2)” for Linux
Magazine, March and April 2005: http://www.stonehenge.com/merlyn/LinuxMag/col68.html and
http://www.stonehenge.com/merlyn/LinuxMag/col69.html.
There are several Tie
modules on
CPAN, and you can peek at the source code to see what they do and steal
ideas for your own.
[60] If I only cared about DNA and I knew that I could represent
every position accurately, I’d only need two bits, and then I could
use vec
. If I wanted to add other symbols, such
as “B” meaning “something not A,” I could add more bits to each
position.
Get Mastering Perl now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.