Chapter 17. The Magic of Tied Variables

Perl lets me hook into its variables through a mechanism it calls tying. I can change how things happen when I access and store values, or just about anything else I do with a variable.

Tied variables go back to the basics. I can decide what Perl will do when I store or fetch values from a variable. Behind the scenes, I have to implement the logic for all of the variable’s behavior. Since I can do that, I can make what look like normal variables do anything that I can program (and that’s quite a bit). Although I might use a lot of magic on the inside, at the user level, tied variables look like the familiar variables. Not only that, tied variables work throughout the Perl API. Even Perl’s internal workings with the variable use the tied behavior.

They Look Like Normal Variables

You probably already have seen tied variables in action, even without using tie. The dbmopen command ties a hash to a database file:

dbmopen %DBHASH, "some_file", 0644;

That’s old school Perl, though. Since then, the numbers and types of these on-disk hashes proliferated and improved. Each implementation solves some problem in another one. If I want to use one of those instead of the implementation Perl wants to use with dbmopen, I use tie to associate my hash with the right module:

tie %DBHASH, 'SDBM_File', $filename, $flags, $mode;

There’s some hidden magic here. The programmer sees the %DBHASH variable, which acts just like a normal hash. To make it work out, though, Perl maintains a “secret object” that it associates with the variable (%DBHASH). I can actually get this object as the return value of tie:

my $secret_obj = tie %DBHASH, 'SDBM_File', $filename, $flags, $mode;

If I forgot to get the secret object when I called tie, I can get it later using tied. Either way, I end up with the normal-looking variable and the object, and I can use either one:

my $secret_obj = tied( %DBHASH );

Any time I do something with %DBHASH, Perl will translate that action into a method call to $secret_obj. Each variable type (scalar, arrays, and so on) has different behaviors, so they have different methods, and that’s what I have to implement.

You might already use tied variables without knowing it. In Chapter 5 of Intermediate Perl, we talked about opening a filehandle to a scalar reference:

open my($fh), ">", \$print_to_this_string
        or die "Could not open string: $!";

Once I have the filehandle, I do filehandle sorts of things with it. How does it perform the magic? The IO::Scalar module implements a tied filehandle and takes responsibility for the filehandle behavior. It can do whatever it likes, in this case printing to a scalar instead of a file.

At the User Level

Back in the day when I did a lot of HTML coding, I liked to alternate the color of table rows. This isn’t a difficult thing to do, but it is annoying. Somewhere I have to store a list of colors to use, then I have to select the next color in the list each time I create a row:

@colors = qw( AAAAAA CCCCCC EEEEEE );

my $row = 0;
foreach my $item ( @items )
        {
        my $color = $colors[ $row++ % ($#colors + 1) ];

        print qq|<tr><td bgcolor="$color">$item</td></tr>|;
        }

That extra couple of lines is really annoying. I especially don’t like declaring the $row variable outside of the foreach loop. It’s not really a problem, but aesthetically, I don’t think it looks nice. Why should I have to deal with the mechanics of selecting a color when my loop is simply about creating a table row?

I created the Tie::Cycle module to fix this. Instead of using an array, I create special behavior for a scalar: every time I access the special scalar, I get back the next color in the list. The tie magic handles all of the other stuff for me. As a side benefit, I don’t have to debug those off-by-one errors I tend to get when I try to recode this operation every time I need it:

use Tie::Cycle;
tie my $color, 'Tie::Cycle', [ qw( AAAAAA CCCCCC EEEEEE ) ];

foreach my $item ( @items )
        {
        print qq|<tr><td bgcolor="$color">$item</td></tr>|;
        }

I can even reuse my tied $color variable. No matter where I stop in the cycle, I can reset it to the beginning if I’d like to start every group of rows with the same color. I get the secret object with tied and then call the reset method I provided when I created the module:

tied( $color )->reset;

foreach my $item ( @other_items )
        {
        print qq|<tr><td bgcolor="$color">$item</td></tr>|;
        }

With Tie::Cycle, I give an array a scalar interface, but I don’t have to do something that tricky. I use the usual interface and simply place restrictions on the storage or access of the data type. I’ll show that in a moment.

Behind the Curtain

Behind the scenes Perl uses an object for the tied variable. Although the user doesn’t treat the tied variable like an object, Perl figures out which methods to call and does the right thing.

At the programmer level, once I take responsibility for the variable’s behavior, I have to tell it how to do everything. The tie mechanism uses special method names, which it expects me to implement. Since each variable type acts a bit differently (I can unshift onto an array but not a scalar, and I can get the keys of a hash but not an array), each type has its additional special tie methods that apply only to it.

Perl 5.8 comes with base classes to help me get started. I can use Tie::Scalar, Tie::Array, Tie::Hash, or Tie::Handle as a starting point for my own Tie::* modules. I usually find that once I decide to do something really special, I don’t get much use out of those.

Each variable type will have a constructor, named by prefixing TIE to its type name (TIESCALAR, and so on), and optional UNTIE and DESTROY methods. After that, each variable type has methods specific to its behavior.

Perl calls the constructor when I use tie. Here’s my earlier example again:

tie my $color, 'Tie::Cycle',
        [ qw( AAAAAA CCCCCC EEEEEE ) ];

Perl takes the class name, Tie::Cycle, and calls the class method, TIESCALAR, giving it the rest of the arguments to tie:

my $secret_object = Tie::Cycle->TIESCALAR(
        [ qw( AAAAAA CCCCCC EEEEEE ) ] );

After it gets the secret object, it associates it with the variable $color.

When $color goes out of scope, Perl translates that into another method call on the secret object, calling its DESTROY method:

$secret_object->DESTROY;

Or I can decide that I don’t want my variable to be tied anymore. By calling untie, I break the association between the secret object and the variable. Now $color is just a normal scalar:

untie $color;

Perl translates that into the call to UNTIE, which breaks the association between the secret object and the variable:

$secret_object->UNTIE;

Scalars

Tied scalars are the easiest to implement since scalars don’t do too much. I can either store or access scalar data. For my special scalar behavior, I have to create two methods: STORE, which Perl calls when I assign a value, and FETCH, which Perl calls when I access the value. Along with those, I provide TIESCALAR, which Perl calls when I use tie, and possibly the DESTROY or UNTIE methods.

The TIESCALAR method works like any other constructor. It gets the class name as its first argument, then a list of the remaining arguments. Those come directly from tie.

Tie::Cycle

In my Tie::Cycle example, everything after the variable name that I’m tying (that is, the class name and the remaining arguments) ends up as the arguments to TIESCALAR. Other than the method name, this looks like a normal constructor. Perl handles all the tying for me, so I don’t have to do that myself:

tie $colors, 'Tie::Cycle', [ qw( AAAAAA CCCCCC EEEEEE ) ];

That’s almost the same as calling TIESCALAR myself:

my $object = Tie::Cycle->TIESCALAR( [ qw( AAAAAA CCCCCC EEEEEE ) ] );

However, since I didn’t use tie, all I get is the object, and Perl doesn’t know anything about the special interface. It’s just a normal object.

In Tie::Cycle (available on CPAN), the start of the module is quite simple. I have to declare the package name, set up the usual module bits, and define my TIESCALAR. I chose to set up the interface to take two arguments: the class name and an anonymous array. There isn’t anything special in that choice. TIESCALAR is going to get all of the arguments from tie and it’s up to me to figure out how to deal with them, including how to enforce the interface.

In this example, I’m simple-minded: I ensure that I have an array reference and that it has more than one argument. Like any other constructor, I return a blessed reference. Even though I’m tying a scalar, I use an anonymous array as my object. Perl doesn’t care what I do as long as I’m consistent. On the outside all of this still looks like a scalar:

package Tie::Cycle;
use strict;

use vars qw( $VERSION );

$VERSION = sprintf "%d.%02d", q$Revision: 1.9 $ =~ m/ (\d+) \. (\d+) /xg;

sub TIESCALAR
        {
        my $class    = shift;
        my $list_ref = shift;

        my @shallow_copy = map { $_ } @$list_ref;

        return unless ref $list_ref eq ref [];

        my $self = [ 0, scalar @shallow_copy, \@shallow_copy ];

        bless $self, $class;
        }

Once I have my tied variable, I use it just like I would any other variable of that type. I use my tied scalar just like any other scalar. I already stored an anonymous array in the object, but if I wanted to change that, I simply assign to the scalar. In this case, I have to assign an anonymous array:

$colors = [ qw(FF0000 00FF00 0000FF) ];

Behind the curtain, Perl calls my STORE method. Again, I don’t get to choose this method name, and I have to handle everything myself. I go through the same sequence I did for TIESCALAR. There’s probably an opportunity for refactoring here, but the cure might be worse than the disease for such a small module):

sub STORE
        {
        my $self     = shift;
        my $list_ref = shift;

        return unless ref $list_ref eq ref [];

        $self = [ 0, scalar @$list_ref, $list_ref ];
        }

Every time I try to get the value of the scalar, Perl calls FETCH. As before, I have to do all of the work to figure out how to return a value. I can do anything that I like as long as I return a value. In Tie::Cycle, I have to figure out which index I need to access, then return that value. I increment the index, figure out the index modulo the number of elements in the array, and then return the right value:

sub FETCH
        {
        my $self   = shift;

        my $index  = $self->[0]++;
        $self->[0] %= $self->[1];

        return $self->[2]->[ $index ];
        }

That’s all I have to do. I could create an UNTIE (or DESTROY) method, but I didn’t create any messes I have to clean up so I don’t do that for Tie::Cycle. There isn’t any additional magic for those. Everything that you already know about DESTROY works the same here.

If you look in the actual Tie::Cycle source, you’ll find additional methods. I can’t get to these through the tie interface, but with an the object form I can. They aren’t part of the tie magic, but since it’s really just an object I can do object-oriented sorts of things, including adding methods. For example, the previous method gets me the previous value from the list without affecting the current index. I can peek without changing anything:

my $previous = tied( $colors )->previous;

The tied gets me the secret object and I immediately call a method on it instead of storing it in a variable. I can do the same thing, using next to peek at the next element:

my $next     = tied( $colors )->next;

And, as I showed earlier, I can reset the cycle:

tied( $colors )->reset;

Bounded Integers

I’ll create a tied scalar that sets an upper bound on the magnitude of the integer, meaning that there is some range around zero that I can store in the variable. To create the class to implement the tie, I do the same thing I had to do for Tie::Cycle: create TIESCALAR, STORE, and FETCH routines:

package Tie::BoundedInteger;
use strict;

use Carp qw(croak);

use vars qw( $VERSION );

$VERSION = 1.0;

sub TIESCALAR
        {
        my $class = shift;
        my $value = shift;
        my $max   = shift;

        my $self = bless [ 0, $max ], $class;

        $self->STORE( $value );

        return $self;
        }

sub FETCH { $_[0]->[0] }

sub STORE
        {
        my $self  = shift;
        my $value = shift;

        my $magnitude = abs $value;

        croak( "The [$value] exceeds the allowed limit [$self->[1]]" )
                if( int($value) != $value || $magnitude > $self->[1] );

        $self->[0] = $value;

        $value;
        }

1;

At the user level, I do the same thing I did before. I call tie with the variable name, the class that implements the behavior, and finally the arguments. In this program, I want to start off with the value 1, and set the magnitude limit to 3. Once I do that, I’ll try to assign $number each of the integer values between -5 and 5 and then print what happened:

#!/usr/bin/perl

use Tie::BoundedInteger;

tie my $number, 'Tie::BoundedInteger', 1, 3;

foreach my $try ( -5 .. 5 )
        {
        my $value =  eval { $number = $try };

        print "Tried to assign [$try], ";
        print "but it didn't work, " unless $number == $try;
        print "value is now [$number]\n";
        }

From the output I can see that I start off with the value 1 in $number, but when I try to assign 7 (a value with a magnitude greater than 5), it doesn’t work and the value is still 1. Normally my program would croak right there, but I used an eval to catch that error. The same thing happens for 6. When I try 5, it works:

Tried to assign [-5], but it didn't work, value is now [1]
Tried to assign [-4], but it didn't work, value is now [1]
Tried to assign [-3], value is now [-3]
Tried to assign [-2], value is now [-2]
Tried to assign [-1], value is now [-1]
Tried to assign [0], value is now [0]
Tried to assign [1], value is now [1]
Tried to assign [2], value is now [2]
Tried to assign [3], value is now [3]
Tried to assign [4], but it didn't work, value is now [3]
Tried to assign [5], but it didn't work, value is now [3]

Self-Destructing Values

My Tie::BoundedInteger example changed how I could store values by limiting their values. I can also change how I fetch the values. In this example, I’ll create Tie::Timely, which sets a lifetime on the value. After that lifetime expires, I’ll get undef when I access the value.

The STORE method is easy. I just store whatever value I get. I don’t care if it’s a simple scalar, a reference, an object, or anything else. Every time I store a value, though, I’ll record the current time too. That way every time I change the value I reset the countdown.

In the FETCH routine, I have two things I can return. If I’m within the lifetime of the value, I return the value. If I’m not, I return nothing at all:

package Tie::Timely;
use strict;

use Carp qw(croak);

use vars qw( $VERSION );

$VERSION = 1.0;

sub TIESCALAR
        {
        my $class      = shift;
        my $value      = shift;
        my $lifetime   = shift;

        my $self = bless [ undef, $lifetime, time ], $class;

        $self->STORE( $value );

        return $self;
        }

sub FETCH { time - $_[0]->[2] > $_[0]->[1] ? () : $_[0]->[0] }

sub STORE { @{ $_[0] }[0,2] = ( $_[1], time ) }

1;

Arrays

I set up tied arrays just like I do tied scalars, but I have extra methods to create since I can do more with arrays. My implementation has to handle the array operators (shift, unshift, push, pop, splice) as well as the other array operations we often take for granted:

  • Getting or setting the last array index

  • Extending the array

  • Checking that an index exists

  • Deleting a element

  • Clearing all the values

Once I decide that I want to implement my own array behavior, I own all of those things. I don’t really have to define methods for each of those operations, but some things won’t work unless I do. The Tie::Array module exists as a bare-bones base class that implements most of these things, although only to croak if a program tries to use something I haven’t implemented. Table 17-1 shows how some array operations translate to tie methods (and perltie has the rest). Most of the methods have the same name as the Perl operator, although in all caps.

Table 17-1. The mapping of selected array actions to tie methods

Action

Array operation

Tie method

Set value

$a[$i] = $n

STORE( $i, $n )

Get value

$n = $a[$i];

FETCH( $i )

Array length

$l = $#a;

FETCHSIZE()

Pre-extend

$#a = $n;

STORESIZE( $n )

Add to end

push @a, @n

PUSH( @n );

Remove from end

pop @a;

POP()

Reinventing Arrays

When I talked about tying scalars, I showed my Tie::Cycle module, which treated an array like a scalar. To be fair, I should go the other way by treating a scalar as an array. Instead of storing several array elements, each of which incurs all of the overhead of a scalar variable, I’ll create one scalar and chop it up as necessary to get the array values. Essentially, my example trades memory space for speed. I’ll reuse my bounded integer example since I can make a number less than 256 fit into a single character. That’s convenient, isn’t it?

package Tie::StringArray;
use strict;

use Carp qw(croak);

use vars qw( $VERSION );

$VERSION = 1.0;

sub _null { "\x00" }
sub _last () { $_[0]->FETCHSIZE - 1 }

sub _normalize_index { $_[1] == abs $_[1] ? $_[1] : $_[0]->_last + 1 - abs $_[1] }

sub _store  { chr $_[1] }
sub _show   { ord $_[1] }
sub _string { ${ $_[0] } }

sub TIEARRAY
        {
        my( $class, @values ) = @_;

        my $string = '';
        my $self = bless \$string, $class;

        my $index = 0;

        $self->STORE( $index++, $_ ) foreach ( @values );

        $self;
        }

sub FETCH
        {
        my $index = $_[0]->_normalize_index( $_[1] );

        $index > $_[0]->_last ? () : $_[0]->_show(
                substr( $_[0]->_string, $index, 1 )
                );
        }

sub FETCHSIZE { length $_[0]->_string }

sub STORESIZE
        {
        my $self     = shift;
        my $new_size = shift;

        my $size = $self->FETCHSIZE;

        if( $size > $new_size ) # truncate
                {
                $$self = substr( $$self, 0, $size );
                }
        elsif( $size < $new_size ) # extend
                {
                $$self .= join '', ($self->_null) x ( $new_size - $size );
                }
        }

sub STORE
        {
        my $self  = shift;
        my $index = shift;
        my $value = shift;

        croak( "The magnitude of [$value] exceeds the allowed limit [255]" )
                if( int($value) != $value || $value > 255 );

        $self->_extend( $index ) if $index >= $self->_last;

        substr( $$self, $index, 1, chr $value );

        $value;
        }

sub _extend
        {
        my $self  = shift;
        my $index = shift;

        $self->STORE( 0, 1 + $self->_last )
                while( $self->_last >= $index );
        }

sub EXISTS  { $_[0]->_last >= $_[1] ? 1 : 0 }
sub CLEAR   { ${ $_[0] } = '' }

sub SHIFT   { $_[0]->_show( substr ${ $_[0] }, 0, 1, '' ) }
sub POP     { $_[0]->_show( chop   ${ $_[0] }           ) }

sub UNSHIFT
        {
        my $self = shift;

        foreach ( reverse @_ )
                {
                substr ${ $self }, 0, 0, $self->_store( $_ )
                }
        }

sub PUSH
        {
        my $self = shift;

        $self->STORE( 1 + $self->_last, $_ ) foreach ( @_ )
        }

sub SPLICE
        {
        my $self      = shift;

        my $arg_count = @_;
        my( $offset, $length, @list ) = @_;

        if(    0 == $arg_count )
                {
                ( 0, $self->_last )
                }
        elsif( 1 == $arg_count )
                {
                ( $self->_normalize_index( $offset ), $self->_last )
                }
        elsif( 2 <= $arg_count ) # offset and length only
                {
                ( $self->_normalize_index( $offset ), do {
                        if( $length < 0 ) { $self->_last - $length }
                        else              { $start + $length - 1   }
                        }
                )
                }

        #@removed = map { $self->POP } $start .. $end;


        if( wantarray )
                {
                @removed;
                }
        else
                {
                defined $removed[-1] ? $removed[-1] : undef;
                }

        }
                
1;

To make this work, I’ll treat each position in my string as an array element. To store a value, in STORE the arguments are the index for the value and the value itself. I need to convert the value to a character and put that character in the right position in the string. If I try to store something other than a whole number between 1 and 255, I get an error.

To fetch a value I need to extract the character from the correct position in the string and convert it to a number. The argument to FETCH is the index of the element so I need to convert that to something I can use with substr.

Now, for the more complex array operations, I have to do a bit more work. To retrieve a splice, I have to grab several values, but splice is also an value so I have to be ready to assign those positions more values. Not only that, a user might assign fewer or more values than the splice extracts, so I have to be ready to shrink or expand the string. That’s not scary, though, since I can already do all of that with a string by using substr.

Deleting an element is a bit trickier. In a normal array I can have an undefined element. How am I going to handle that in the middle of a string? Amazingly, my example left me a way to handle this: I can store a undef as a null byte. If I had to store numbers between 0 and 255, I would have been in trouble. Curious how that works out.

Perl also lets me extend a tied array. In a normal array, I can extend an array to let Perl know I want it to do the work to make a certain number of elements available (thus explicitly circumventing Perl’s built-in logic to make its best guess about the proper array length). In this example, I just need to extend the string.

Something a Bit More Realistic

I contrived that last example so I could show the whole process without doing anything too tricky. I might want to store an array of characters, and that example would work quite well for that. Now I want to adapt it to store a DNA sequence. My domain changes from 256 things to something much smaller, the set { T C G A }, which represents thymine, cytosine, guanine, and adenine. If I add in the possibility of a NULL (maybe my gene sequencer can’t tell what should be in a particular position), I have six possibilities. I don’t need an entire character for that. I can actually get by with three bits and have a little to spare.

Before I get too deeply into this, let me make a guess about how much memory this can save. A typical DNA sequence has several thousand base pairs. If I used an array for that, I’d have the scalar overhead for each one of those. I’ll say that’s 10 bytes, just to be kind. For 10,000 base pairs, which is just a small sequence, that’s 100,000 bytes. That scalar overhead really starts to add up! Now, instead of that, I’ll store everything in a single scalar. I’ll incur the scalar overhead once. For 10,000 base pairs at three bits a pair, that’s 30,000 bits, or 3,750 bytes. I round that off to 4,000 bytes. That’s a factor of 50! Remember, this memory parsimony comes at the expense of speed. I’ll have to do a little bit more computational work.

With six bits I have eight distinct patterns. I need to assign some of those patterns meanings. Fortunately for me, Perl makes this really easy since I can type out binary strings directly as long as I’m using Perl 5.6 or later (see Chapter 16 for more on bit operations):

use constant N => 0b000;
use constant T => 0b001;
use constant C => 0b100;
use constant G => 0b110;
use constant A => 0b011;

use constant RESERVED1 => 0b111;
use constant RESERVED2 => 0b101;

Also, since I’m not using characters anymore, I can’t use substr. For vec, I’d have to partition the bits by powers of two, but I’d have to waste another bit for that (and I’m already wasting two).[60]If I do that, I end up with 10 unused patterns. That might be nice if we eventually meet aliens with more complex hereditary encodings, but for now I’ll just stick with what we have.

Before you get scared off by this code, remember what I’m doing. It’s exactly the same problem as the last example where I stored digits as characters in a long string. This time I’m doing it at the bit level with a bit more math. My specific example doesn’t matter as much as the concept that I can make anything, and I mean anything, look like an array if I’m willing to do all the work:

package Tie::Array::DNA;
use strict;
use base qw(Tie::Array);

use Carp qw(croak carp);

use vars qw( $VERSION );
$VERSION = 1.0;

use constant BITS_PER_ELEMENT =>  3;
use constant BIT_PERIOD       => 24; # 24 bits
use constant BYTE_LENGTH      =>  8;
use constant BYTE_PERIOD      =>  3; # 24 bits

my %Patterns = (
        T => 0b001,
        A => 0b011,
        C => 0b100,
        G => 0b110,
        N => 0b000,
        );
my @Values   = ();
foreach my $key ( keys %Patterns )
        {
        $Values[ $Patterns{$key} ] = $key
        }

sub _normalize { uc $_[1] }
sub _allowed   { length $_[1] eq 1 and $_[1] =~ tr/TCGAN// }

my %Last;

sub TIEARRAY
        {
        my( $class, @values ) = @_;

        my $string = \'';
        my $self = bless $string, $class;

        $$self = "\x00" x 10_000;
        $Last{ "foo" } = -1;

        my $index = 0;

        $self->STORE( $index++, $_ ) foreach ( @values );

        $self;
        }

sub _get_start_and_length
        {
        my( $self, $index ) = @_;

        my $bytes_to_start = int( $index * BITS_PER_ELEMENT / BYTE_LENGTH );

        my $byte_group = int( $bytes_to_start / BYTE_PERIOD );

        my $start  = $byte_group * BYTE_PERIOD;

        ( $start, BYTE_PERIOD )
        }

sub _get_bytes
        {
        my( $self, $index ) = @_;

        my( $start, $length ) = $self->_get_start_and_length( $index );

        my @chars = split //, substr( $$self, $start, $length );

        (ord( $chars[0] ) << 16) +
                (ord( $chars[1] ) << 8) +
                        ord( $chars[2] );
        }

sub _save_bytes
        {
        my( $self, $index, $bytes ) = @_;

        my( $start, $length ) = $self->_get_start_and_length( $index );

        my $new_string = join '', map {
                chr(
                        ( $bytes & ( 0xFF << $_ ) )
                        >>
                        $_
                        )
                } qw( 16 8 0 );

        substr( $$self, $start, $length, $new_string );
        }

sub _get_shift
        {
        BIT_PERIOD - BITS_PER_ELEMENT - ($_[1] * BITS_PER_ELEMENT % BIT_PERIOD);
        }

sub _get_clearing_mask
        { ~ ( 0b111 << $_[0]->_get_shift( $_[1] ) ) }

sub _get_setting_mask
        { $_[0]->_get_pattern_by_value( $_[2] ) << $_[0]->_get_shift( $_[1] ) }

sub _get_selecting_mask
        { 0b111 << $_[0]->_get_shift( $_[1] ) }

sub _get_pattern_by_value {   $Patterns{ $_[1] }   }
sub _get_null_pattern     {   $Patterns{ 'N'   }   }

sub _get_value_by_pattern {   $Values  [ $_[1] ]   }

sub _string    { $_[0] }

sub _length { length ${$_[0]} }

sub _add_to_string { ${$_[0]} .= $_[1] }

sub STORE
        {
        my( $self, $index, $value )  = @_;

        $value = $self->_normalize( $value );

        carp( qq|Cannot store unallowed element "$value"| )
                unless $self->_allowed( $value );

        $self->_extend( $index ) if $index > $self->_last;

        # get the mask
        my $clear_mask  = $self->_get_clearing_mask( $index );
        my $set_mask    = $self->_get_setting_mask( $index, $value );

        # clear the area
        my $result = ( $self->_get_bytes( $index ) & $clear_mask ) | $set_mask;

        # save the string
        my( $start, $length ) = $self->_get_start_and_length( $index );

        my $new_string = join '', map {
                chr(
                        ( $result & ( 0xFF << $_ ) )
                        >>
                        $_
                        )
                } qw( 16 8 0 );

        substr( $$self, $start, $length, $new_string );

        $self->_set_last( $index ) if $index > $self->_last;

        $value
        }

sub FETCH
        {
        my( $self, $index ) = @_;

        # get the right substr
        my $bytes = $self->_get_bytes( $index );

        # get the mask
        my $select_mask  = $self->_get_selecting_mask( $index );
        my $shift        = $self->_get_shift( $index );

        # clear the area
        my $pattern = 0 + ( ( $bytes & $select_mask ) >> $shift );

        $self->_get_value_by_pattern( $pattern );
        }

sub FETCHSIZE { $_[0]->_last + 1 }
sub STORESIZE { $_[0]->_set_last( $_[1] ) }

sub EXTEND  { }
sub CLEAR   { ${ $_[0] } = '' }
sub EXISTS  { $_[1] < $Last{ "foo" }  }

sub DESTROY { }

__PACKAGE__;

This code gets a bit complicated because I have to implement my own array. Since I’m storing everything in a single string and using the string as a long string of bits instead of characters, I have to come up with a way to get the information that I need.

I’m using three bits per element and characters come with eight bits. To make everything simpler, I decide to deal with everything in 3-byte (24-bit) chunks because that’s the lowest common denominator between 3-bit and 8-bit chunks of data. I do that in _get_bytes and _save_bytes, which figure out which three characters they need to grab. The _get_bytes method turns the three characters into a single number so I can later use bit operations on it, and the _save_bytes method goes the other way.

Once I have the number, I need to know how to pull out the three bits. There are eight elements in each group, so _get_selecting_mask figures out which of those elements I want and returns the right bit mask to select it. That bit mask is just 0b111 shifted up the right number of places. The _get_shift method handles that in general by using the constants BIT_PERIOD and BITS_PER_ELEMENT.

Once I got all of that in place, my FETCH method can use it to return an element. It gets the bit pattern then looks up that pattern with _get_value_by_pattern to turn the bits into the symbolic version (i.e., T, A, C, G ).

The STORE method does all that but the other way around. It turns the symbols into the bit pattern, shifts that up the right amount, and does the right bit operations to set the value. I ensure that I clear the target bits first using the mask, I get back from _get_clearing_mask. Once I clear the target bits, I can use the bit mask from _get_setting_mask to finally store the element.

Whew! Did you make it this far? I haven’t even implemented all of the array features. How am I going to implement SHIFT, UNSHIFT, or SPLICE? Here’s a hint: remember that Perl has to do this for real arrays and strings. Instead of moving things over every time I affect the front of the data, it keeps track of where it should start, which might not be the beginning of the data. If I wanted to shift off a single element, I just have to add that offset of three bits to all of my computations. The first element would be at bits 3 to 5 instead of 0 to 2. I’ll leave that up to you, though.

Hashes

Tied hashes are only a bit more complicated than tied arrays, but like all tied variables, I set them up in the same way. I need to implement methods for all of the actions I want my tied hash to handle. Table 17-2 shows some of the hash operations and their corresponding tied methods.

Table 17-2. The mapping of selected hash actions to tie methods

Action

Hash operation

Tie method

Set value

$h{$str} = $val;

STORE( $str, $val )

Get value

$val = $h{$str};

FETCH( $str )

Delete a key

delete $h{$str};

DELETE( $str )

Check for a key

exists $h{$str};

EXISTS( $str )

Next key

each %h;

NEXTKEY( $str )

Clear the hash

%h = ();

CLEAR( $str )

One common task, at least for me, is to accumulate a count of something in a hash. One of my favorite examples to show in Perl courses is a word frequency counter. By the time students get to the third day of the Learning Perl course, they know enough to write a simple word counter:

my %hash = ();

while( <> )
        {
        chomp;
        my @words = split;
        foreach my $word ( @words ) { $hash{$word}++ }
        }

foreach my $word ( sort { $hash{$b} <=> $hash{$a} } keys %hash )
        {
        printf "%4d  %-20s\n", $hash{$word}, $word;
        }

When students actually start to use this, they discover that it’s really not as simple as all that. Words come in different capitalizations, with different punctuation attached to them, and possibly even misspelled. I could add a lot of code to that example to take care of all of those edge cases, but I can also fix that up in the hash assignment itself. I replace my hash declaration with a call to tie and leave the rest of the program alone:

# my %hash = (); # old way
tie my( %hash ), 'Tie::Hash::WordCounter';

while( <> )
        {
        chomp;
        my @words = split;
        foreach my $word ( @words ) { $hash{$word}++ }
        }

foreach my $word ( sort { $hash{$b} <=> $hash{$a} } keys %hash )
        {
        printf "%4d  %-20s\n", $hash{$word}, $word;
        }

I can make a tied hash do anything that I like, so I can make it handle those edge cases by normalizing the words I give it when I do the hash assignment. My tiny word counter program doesn’t have to change that much and I can hide all the work behind the tie interface.

I’ll handle most of the complexity in the STORE method. Everything else will act just like a normal hash, and I’m going to use a hash behind the scenes. I should also be able to access a key by ignoring the case and punctuation issues so my FETCH method normalizes its argument in the same way:

package Tie::Hash::WordCounter;
use strict;
use Tie::Hash;

use base qw(Tie::StdHash);

use vars qw( $VERSION );

$VERSION = 1.0;

sub TIEHASH  { bless {}, $_[0] }

sub _normalize
        {
        my( $self, $key ) = @_;

        $key =~ s/^\s+//;
        $key =~ s/\s+$//;

        $key = lc( $key );

        $key =~ s/[\W_]//g;

        return $key
        }

sub STORE
        {
        my( $self, $key, $value ) = @_;

        $key = $self->_normalize( $key );

        $self->{ $key } = $value;
        }

sub FETCH
        {
        my( $self, $key ) = @_;

        $key = $self->_normalize( $key );

        $self->{ $key };
        }

__PACKAGE__;

Filehandles

By now you know what I’m going to say: tied filehandles are like all the other tied variables. Table 17-3 shows selected file operations and their corresponding tied methods. I simply need to provide the methods for the special behavior I want.

Table 17-3. The mapping of selected filehandle actions to tie methods

Action

File operation

Tie method

Print to a filehandle

print FH "...";

PRINT( @a )

Read from a filehandle

$line = <FH>;

READLINE()

Close a filehandle

close FH;

CLOSE()

For a small example, I create Tie::File::Timestamp, which appends a timestamp to each line of output. Suppose I start with a program that already has several print statements. I didn’t write this program, but my task is to add a timestamp to each line:

# old program
open LOG, ">>", "log.txt" or die "Could not open output.txt! $!";

print LOG "This is a line of output\n";
print LOG "This is some other line\n";

I could do a lot of searching and a lot of typing, or I could even get my text editor to do most of the work for me. I’ll probably miss something, and I’m always nervous about big changes. I can make a little change by replacing the filehandle. Instead of open, I’ll use tie, leaving the rest of the program as it is:

# new program
#open LOG, ">>", "log.txt" or die "Could not open output.txt! $!";
tie *LOG, "Tie::File::Timestamp", "log.txt"
        or die "Could not open output.txt! $!";

print LOG "This is a line of output\n";
print LOG "This is some other line\n";

Now I have to make the magic work. It’s fairly simple since I only have to deal with four methods. In TIEHANDLE, I open the file. If I can’t do that, I simply return, triggering the die in the program since tie doesn’t return a true value. Otherwise, I return the filehandle reference, which I’ve blessed into my tied class. That’s the object I’ll get as the first argument in the rest of the methods.

My output methods are simple. They’re simple wrappers around the built-in print and printf. I use the tie object as the filehandle reference (wrapping it in braces as Perl Best Practices recommends to signal to other people that’s what I mean to do). In PRINT, I simply add a couple of arguments to the rest of the stuff I pass to print. The first additional argument is the timestamp, and the second is a space character to make it all look nice. I do a similar thing in PRINTF, although I add the extra text to the $format argument:

package Tie::File::Timestamp;
use strict;
use vars qw($VERSION);

use Carp qw(croak);

$VERSION = 0.01;

sub _timestamp { "[" . localtime() . "]" }

sub TIEHANDLE
        {
        my $class = shift;
        my $file  = shift;

        open my( $fh ), ">> $file" or return;

        bless $fh, $class;
        }

sub PRINT
        {
        my( $self, @args ) = @_;

        print { $self } $self->_timestamp, " ", @args;
        }

sub PRINTF
        {
        my( $self, $format, @args ) = @_;

        $format = $self->_timestamp . " " . $format;

        printf { $self } $format, @args;
        }

sub CLOSE { close $_[0] }

__PACKAGE__;

Tied filehandles have a glaring drawback, though: I can only do this with filehandles. Since Learning Perl, I’ve been telling you that bareword filehandles are the old way of doing things and that storing a filehandle reference in a scalar is the new and better way.

If I try to use a scalar variable, tie looks for TIESCALAR method, along with the other tied scalar methods. It doesn’t look for PRINT, PRINTF, and all of the other input/output methods I need. I can get around that with a little black magic that I don’t recommend. I start with a glob reference, *FH, which creates an entry in the symbol table. I wrap a do block around it to form a scope and to get the return value (the last evaluated expression). Since I only use the *FH once, unless I turn off warnings in that area, Perl will tell me that I’ve only used *FH once. In the tie, I have to dereference $fh as a glob reference so tie looks for TIEHANDLE instead of TIESCALAR. Look scary? Good. Don’t do this!

my $fh = \do{ no warnings; local *FH };
my $object = tie *{$fh}, $class, $output_file;

Summary

I’ve showed you a lot of tricky code to reimplement Perl data types in Perl. The tie interface lets me do just about anything that I want, but I also then have to do all of the work to make the variables act like people expect them to act. With this power comes great responsibility and a lot of work.

For more examples, inspect the Tie modules on CPAN. You can peek at the source code to see what they do and steal ideas for your own.

Further Reading

Teodor Zlatanov writes about “Tied Variables” for IBM developerWorks, January 2003: http://www-128.ibm.com/developerworks/linux/library/l-cptied.html.

Phil Crow uses tied filehandles to implement some design patterns in Perl in “Perl Design Patterns” for Perl.com: http://www.perl.com/lpt/a/2003/06/13/design1.html.

Dave Cross writes about tied hashes in “Changing Hash Behaviour with tie” for Perl.com: http://www.perl.com/lpt/a/2001/09/04/tiedhash.html.

Abhijit Menon-Sen uses tied hashes to make fancy dictionaries in “How Hashes Really Work” for Perl.com: http://www.perl.com/lpt/a/2002/10/01/hashes.html.

Randal Schwartz discusses tie in “Fit to be tied (Parts 1 & 2)” for Linux Magazine, March and April 2005: http://www.stonehenge.com/merlyn/LinuxMag/col68.html and http://www.stonehenge.com/merlyn/LinuxMag/col69.html.

There are several Tie modules on CPAN, and you can peek at the source code to see what they do and steal ideas for your own.



[60] If I only cared about DNA and I knew that I could represent every position accurately, I’d only need two bits, and then I could use vec. If I wanted to add other symbols, such as “B” meaning “something not A,” I could add more bits to each position.

Get Mastering Perl now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.