Many moons ago (about five years), I used machines that had no tools
for bundling files into a single package for easy transport. The
situation is this: you have a large set of text files lying around
that you need to transfer to another computer. These days, tools like
tar
are widely available for packaging many files
into a single file that can be copied, uploaded, mailed, or otherwise
transferred in a single step. Even Python itself has grown to support
zip archives in the 2.0 standard library (see module
zipfile
).
Before I managed to install such tools on my PC, though, portable Python scripts served just as well. Example 4-6 copies all the files listed on the command line to the standard output stream, separated by marker lines.
Example 4-6. PP2E\System\App\Clients\textpack.py
#!/usr/local/bin/python import sys # load the system module marker = ':'*10 + 'textpak=>' # hopefully unique separator def pack( ): for name in sys.argv[1:]: # for all command-line arguments input = open(name, 'r') # open the next input file print marker + name # write a separator line print input.read( ), # and write the file's contents if __name__ == '__main__': pack( ) # pack files listed on cmdline
The first line in this file is a Python comment
(#...
), but it also gives the path to the Python
interpreter using the Unix executable-script trick discussed in Chapter 2. If we give textpack.py
executable permission with a Unix chmod command,
we can pack files by running this program file directly from a Unix
shell, and redirect its standard output stream to the file we want
the packed archive to show up in. It works the same on Windows, but
we just type the interpreter name “python” instead:
C:\...\PP2E\System\App\Clients\test>type spam.txt
SPAM spam C:\......\test>python ..\textpack.py spam.txt eggs.txt ham.txt > packed.all
C:\......\test>type packed.all
::::::::::textpak=>spam.txt SPAM spam ::::::::::textpak=>eggs.txt EGGS ::::::::::textpak=>ham.txt ham
Running the program this way creates a single output file called
packed.all
, which contains all three input
files, with a header line giving the original file’s name
before each file’s contents. Combining many files into one like
this makes it easy to transfer in a single step -- only one file
need be copied to floppy, emailed, and so on. If you have hundreds of
files to move, this can be a big win.
After such a file is transferred, though, it must somehow be unpacked on the receiving end, to recreate the original files. To do so, we need to scan the combined file line by line, watching for header lines left by the packer to know when a new file’s contents begins. Another simple Python script, shown in Example 4-7, does the trick.
Example 4-7. PP2E\System\App\Clients\textunpack.py
#!/usr/local/bin/python import sys from textpack import marker # use common seperator key mlen = len(marker) # file names after markers for line in sys.stdin.readlines( ): # for all input lines if line[:mlen] != marker: print line, # write real lines else: sys.stdout = open(line[mlen:-1], 'w') # or make new output file
We could code this in a function like we did in
textpack
, but there is little point here -- as
written, the script relies on standard streams, not function
parameters. Run this in the directory where you want unpacked files
to appear, with the packed archive file piped in on the command line
as the script’s standard input stream:
C:\......\test\unpack>python ..\..\textunpack.py < ..\packed.all
C:\......\test\unpack>ls
eggs.txt ham.txt spam.txt C:\......\test\unpack>type spam.txt
SPAM Spam
So far so good; the textpack
and
textunpack
scripts made it easy to move lots of
files around, without lots of manual intervention. But after playing
with these and similar scripts for a while, I began to see
commonalities that almost cried out for reuse.
For instance, almost every shell tool I wrote had to scan
command-line arguments, redirect streams to a variety of sources, and
so on. Further, almost every command-line utility wound up with a
different command-line option pattern, because each was written from
scratch.
The following few classes are one solution to such problems. They
define a class hierarchy that is designed for
reuse of common shell tool code. Moreover, because of the reuse going
on, every program that ties into its hierarchy sports a common
look-and-feel in terms of command-line options, environment variable
use, and more. As usual with object-oriented systems, once you learn
which methods to overload, such a class framework provides a lot of
work and consistency for free. The module in Example 4-8 adapts the textpack
script’s logic for integration into this hierarchy.
Example 4-8. PP2E\System\App\Clients\packapp.py
#!/usr/local/bin/python ###################################################### # pack text files into one, separated by marker line; # % packapp.py -v -o target src src... # % packapp.py *.txt -o packed1 # >>> apptools.appRun('packapp.py', args...) # >>> apptools.appCall(PackApp, args...) ###################################################### from textpack import marker from PP2E.System.App.Kinds.redirect import StreamApp class PackApp(StreamApp): def start(self): StreamApp.start(self) if not self.args: self.exit('packapp.py [-o target]? src src...') def run(self): for name in self.restargs( ): try: self.message('packing: ' + name) self.pack_file(name) except: self.exit('error processing: ' + name) def pack_file(self, name): self.setInput(name) self.write(marker + name + '\n') while 1: line = self.readline( ) if not line: break self.write(line) if __name__ == '__main__': PackApp( ).main( )
Here, PackApp
inherits members and methods that
handle:
Operating system services
Command-line processing
Input/output stream redirection
from the StreamApp
class, imported from another
Python module file (listed in Example 4-10).
StreamApp
provides a “read/write”
interface to redirected streams, and provides a standard
“start/run/stop” script execution protocol.
PackApp
simply redefines the
start
and run
methods for its
own purposes, and reads and writes itself to
access its standard streams. Most low-level system interfaces are
hidden by the StreamApp
class; in OOP terms, we
say they are
encapsulated.
This module can both be run as a program, and imported by a client
(remember, Python sets a module’s name to __main_ _
when it’s run directly, so it can tell the
difference). When run as a program, the last line creates an instance
of the PackApp
class, and starts it by calling its
main
method -- a method call exported by
StreamApp
to kick off a program run:
C:\......\test>python ..\packapp.py -v -o packedapp.all spam.txt eggs.txt ham.txt
PackApp start. packing: spam.txt packing: eggs.txt packing: ham.txt PackApp done. C:\......\test>type packedapp.all
::::::::::textpak=>spam.txt SPAM spam ::::::::::textpak=>eggs.txt EGGS ::::::::::textpak=>ham.txt ham
This has the same effect as the textpack.py
script, but command-line options (-v
for verbose
mode, -o
to name an output file) are inherited
from the StreamApp
superclass. The unpacker in
Example 4-9 looks similar when migrated to the OO
framework, because the very notion of running a program has been
given a standard structure.
Example 4-9. PP2E\System\App\Clients\unpackapp.py
#!/usr/bin/python ########################################### # unpack a packapp.py output file; # % unpackapp.py -i packed1 -v # apptools.appRun('unpackapp.py', args...) # apptools.appCall(UnpackApp, args...) ########################################### import string from textpack import marker from PP2E.System.App.Kinds.redirect import StreamApp class UnpackApp(StreamApp): def start(self): StreamApp.start(self) self.endargs( ) # ignore more -o's, etc. def run(self): mlen = len(marker) while 1: line = self.readline( ) if not line: break elif line[:mlen] != marker: self.write(line) else: name = string.strip(line[mlen:]) self.message('creating: ' + name) self.setOutput(name) if __name__ == '__main__': UnpackApp( ).main( )
This subclass redefines start
and
run
methods to do the right thing for this
script -- prepare for and execute a file unpacking operation. All
the details of parsing command-line arguments and redirecting
standard streams are handled in superclasses:
C:\......\test\unpackapp>python ..\..\unpackapp.py -v -i ..\packedapp.all
UnpackApp start. creating: spam.txt creating: eggs.txt creating: ham.txt UnpackApp done. C:\......\test\unpackapp>ls
eggs.txt ham.txt spam.txt C:\......\test\unpackapp>type spam.txt
SPAM spam
Running this script does the same job as the original
textunpack.py
, but we get command-line flags for
free (-i
specifies the input files). In fact,
there are more ways to launch classes in this hierarchy than I have
space to show here. A command line pair, -i
-
, for instance, makes the script read its input
from stdin
, as though it were simply piped or
redirected in the shell:
C:\......\test\unpackapp>type ..\packedapp.all | python ..\..\unpackapp.py -i -
creating: spam.txt
creating: eggs.txt
creating: ham.txt
This section lists the source code of StreamApp
and App
-- the classes that do all this extra
work on behalf of PackApp
and
UnpackApp
. We don’t have space to go through
all this code in detail, so be sure to study these listings on your
own for more information. It’s all straight Python code.
I should also point out that the classes listed in this section are
just the ones used by the object-oriented mutations of the
textpack
and textunpack
scripts. They represent just one branch of an overall application
framework class tree, that you can study on this book’s CD
(see http://examples.oreilly.com/python2 and browse directory PP2E\System\App
). Other
classes in the tree provide command menus, internal string-based file
streams, and so on. You’ll also find additional clients of the
hierarchy that do things like launch other shell tools, and scan
Unix-style email mailbox files.
StreamApp
adds a few command-line arguments
(-i
, -o
) and input/output
stream redirection to the more general App
root
class listed later; App
in turn defines the most
general kinds of program behavior, to be inherited in Examples Example 4-8, Example 4-9, and Example 4-10, i.e., in all classes derived from
App
.
Example 4-10. PP2E\System\App\Kinds\redirect.py
################################################################################ # App subclasses for redirecting standard streams to files ################################################################################ import sys from PP2E.System.App.Bases.app import App ################################################################################ # an app with input/output stream redirection ################################################################################ class StreamApp(App): def __init__(self, ifile='-', ofile='-'): App.__init__(self) # call superclass init self.setInput( ifile or self.name + '.in') # default i/o file names self.setOutput(ofile or self.name + '.out') # unless '-i', '-o' args def closeApp(self): # not __del__ try: if self.input != sys.stdin: # may be redirected self.input.close( ) # if still open except: pass try: if self.output != sys.stdout: # don't close stdout! self.output.close( ) # input/output exist? except: pass def help(self): App.help(self) print '-i <input-file |"-"> (default: stdin or per app)' print '-o <output-file|"-"> (default: stdout or per app)' def setInput(self, default=None): file = self.getarg('-i') or default or '-' if file == '-': self.input = sys.stdin self.input_name = '<stdin>' else: self.input = open(file, 'r') # cmdarg | funcarg | stdin self.input_name = file # cmdarg '-i -' works too def setOutput(self, default=None): file = self.getarg('-o') or default or '-' if file == '-': self.output = sys.stdout self.output_name = '<stdout>' else: self.output = open(file, 'w') # error caught in main( ) self.output_name = file # make backups too? class RedirectApp(StreamApp): def __init__(self, ifile=None, ofile=None): StreamApp.__init__(self, ifile, ofile) self.streams = sys.stdin, sys.stdout sys.stdin = self.input # for raw_input, stdin sys.stdout = self.output # for print, stdout def closeApp(self): # not __del__ StreamApp.closeApp(self) # close files? sys.stdin, sys.stdout = self.streams # reset sys files ############################################################ # to add as a mix-in (or use multiple-inheritance...) ############################################################ class RedirectAnyApp: def __init__(self, superclass, *args): apply(superclass.__init__, (self,) + args) self.super = superclass self.streams = sys.stdin, sys.stdout sys.stdin = self.input # for raw_input, stdin sys.stdout = self.output # for print, stdout def closeApp(self): self.super.closeApp(self) # do the right thing sys.stdin, sys.stdout = self.streams # reset sys files
The top of the hierarchy knows what it means to be a shell
application, but not how to accomplish a particular utility task
(those parts are filled in by subclasses). App
,
listed in Example 4-11, exports commonly used tools
in a standard and simplified interface, and a customizable
start
/run
/stop
method protocol that abstracts script execution. It also turns
application objects into file-like objects: when an application reads
itself, for instance, it really reads whatever source its standard
input stream has been assigned to by other superclasses in the tree
(like StreamApp
).
Example 4-11. PP2E\System\App\Bases\app.py
################################################################################ # an application class hierarchy, for handling top-level components; # App is the root class of the App hierarchy, extended in other files; ################################################################################ import sys, os, traceback AppError = 'App class error' # errors raised here class App: # the root class def __init__(self, name=None): self.name = name or self.__class__.__name__ # the lowest class self.args = sys.argv[1:] self.env = os.environ self.verbose = self.getopt('-v') or self.getenv('VERBOSE') self.input = sys.stdin self.output = sys.stdout self.error = sys.stderr # stdout may be piped def closeApp(self): # not __del__: ref's? pass # nothing at this level def help(self): print self.name, 'command-line arguments:' # extend in subclass print '-v (verbose)' ############################## # script environment services ############################## def getopt(self, tag): try: # test "-x" command arg self.args.remove(tag) # not real argv: > 1 App? return 1 except: return 0 def getarg(self, tag, default=None): try: # get "-x val" command arg pos = self.args.index(tag) val = self.args[pos+1] self.args[pos:pos+2] = [] return val except: return default # None: missing, no default def getenv(self, name, default=''): try: # get "$x" environment var return self.env[name] except KeyError: return default def endargs(self): if self.args: self.message('extra arguments ignored: ' + `self.args`) self.args = [] def restargs(self): res, self.args = self.args, [] # no more args/options return res def message(self, text): self.error.write(text + '\n') # stdout may be redirected def exception(self): return (sys.exc_type, sys.exc_value) # the last exception def exit(self, message='', status=1): if message: self.message(message) sys.exit(status) def shell(self, command, fork=0, inp=''): if self.verbose: self.message(command) # how about ipc? if not fork: os.system(command) # run a shell cmd elif fork == 1: return os.popen(command, 'r').read( ) # get its output else: # readlines too? pipe = os.popen(command, 'w') pipe.write(inp) # send it input pipe.close( ) ################################################# # input/output-stream methods for the app itself; # redefine in subclasses if not using files, or # set self.input/output to file-like objects; ################################################# def read(self, *size): return apply(self.input.read, size) def readline(self): return self.input.readline( ) def readlines(self): return self.input.readlines( ) def write(self, text): self.output.write(text) def writelines(self, text): self.output.writelines(text) ################################################### # to run the app # main( ) is the start/run/stop execution protocol; ################################################### def main(self): res = None try: self.start( ) self.run( ) res = self.stop( ) # optional return val except SystemExit: # ignore if from exit( ) pass except: self.message('uncaught: ' + `self.exception( )`) traceback.print_exc( ) self.closeApp( ) return res def start(self): if self.verbose: self.message(self.name + ' start.') def stop(self): if self.verbose: self.message(self.name + ' done.') def run(self): raise AppError, 'run must be redefined!'
Now that I’ve listed all this code, some readers might
naturally want to ask, “So why go to all this trouble?”
Given the amount of extra code in the OO version
of these scripts, it’s a perfectly valid question. Most of the
code listed in Example 4-11 is general-purpose logic,
designed to be used by many applications. Still, that doesn’t
explain why the packapp
and
unpackapp
OO scripts are larger than the original
equivalent textpack
and
textunpack
non-OO scripts.
The answers will become more apparent after the first few times you don’t have to write code to achieve a goal, but there are some concrete benefits worth summarizing here:
- Encapsulation
StreamApp
clients need not remember all the system interfaces in Python, becauseStreamApp
exports its own unified view. For instance, arguments, streams, and shell variables are split across Python modules (e.g.,sys.argv
,sys.stdout
,os.environ
); in these classes, they are all collected in the same single place.- Standardization
From the shell user’s perspective,
StreamApp
clients all have a common look-and-feel, because they inherit the same interfaces to the outside world from their superclasses (e.g.,-i
and-v
flags).- Maintenance
All the common code in the
App
andStreamApp
superclasses must be debugged only once. Moreover, localizing code in superclasses makes it easier to understand and change in the future.- Reuse
Such a framework can provide an extra precoded utility we would otherwise have to recode in every script we write (command-line argument extraction, for instance). That holds true both now and in the future -- services added to the
App
root class become immediately usable and customizable among all applications derived from this hierarchy.- Utility
Because file access isn’t hardcoded in
PackApp
andUnpackApp
, they can easily take on new behavior, just by changing the class they inherit from. Given the right superclass,PackApp
andUnpackApp
could just as easily read and write to strings or sockets, as to text files and standard streams.
Although it’s not obvious until you start writing larger class-based systems, code reuse is perhaps the biggest win for class-based programs. For instance, in Chapter 9, we will reuse the OO-based packer and unpacker scripts by invoking them from a menu GUI like this:
from PP2E.System.App.Clients.packapp import PackApp ...get dialog inputs, glob filename patterns app = PackApp(ofile=output) # run with redirected output app.args = filenames # reset cmdline args list app.main( ) from PP2E.System.App.Clients.unpackapp import UnpackApp ...get dialog input app = UnpackApp(ifile=input) # run with input from file app.main( ) # execute app class
Because these classes encapsulate the notion of streams, they can be
imported and called, not just run as top-level scripts. Further,
their code is reusable two ways: not only do they export common
system interfaces for reuse in subclasses, but they can also be used
as software components as in the previous code
listing. See the PP2E\Gui\Shellgui
directory for
the full source code of these clients.
Python doesn’t impose OO programming, of course, and you can get a lot of work done with simpler functions and scripts. But once you learn how to structure class trees for reuse, going the extra OO mile usually pays off in the long run.
Get Programming Python, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.