Credit: Matthew Dixon Cowles
The walk
method of message objects generated by
the email
module (new as of Python 2.2) makes this
task really easy:
import email.Parser import os, sys def main( ): if len(sys.argv)==1: print "Usage: %s filename" % os.path.basename(sys.argv[0]) sys.exit(1) mailFile = open(sys.argv[1], "rb") p = email.Parser.Parser( ) msg = p.parse(mailFile) mailFile.close( ) partCounter = 1 for part in msg.walk( ): if part.get_main_type( )=="multipart": continue name = part.get_param("name") if name==None: name = "part-%i" % partCounter partCounter+=1 # In real life, make sure that name is a reasonable filename # for your OS; otherwise, mangle it until it is! f = open(name,"wb") f.write(part.get_payload(decode=1)) f.close( ) print name if _ _name_ _=="_ _main_ _": main( )
The email
module, new in Python 2.2, makes parsing
MIME messages reasonably easy. (See the Library Reference for detailed documentation about the
email
module.) This recipe shows how to
recursively unbundle a MIME message with the email
module in the easiest way, using the walk
method
of message objects.
You can create a message object in several ways. For example, you can
instantiate the email.Message.Message
class and
build the message object’s contents with calls to
its add_payload
method. In this recipe, I need to
read and analyze an existing message, so I worked the other way
around, calling the
parse
method of an email.Parser.Parser
instance. The
parse
method takes as its only argument a
file-like object (in the recipe, I pass it a real file object that I
just opened for binary reading with the built-in
open
function) and returns a message object, on
which you can call message object methods.
The walk
method is a generator, i.e., it returns
an iterator object on which you can loop with a
for
statement. Usually, you will use this method
exactly as I use it in this recipe:
for part in msg.walk( ):
The iterator sequentially returns (depth-first, in case of nesting)
the parts that comprise the message. If the message is not a
container of parts (has no attachments or alternates, i.e.,
message.is_multipart( )
is false), no problem: the
walk
method will return an iterator with a single
element: the message itself. In any case, each element of the
iterator is also a message object (an instance of
email.Message.Message
), so you can call on it any
of the methods a message object supplies.
In a multipart message, parts with a type of
'multipart/something'
(i.e., a main type of
'multipart'
) may be present. In this recipe, I
skip them explicitly since they’re just glue holding
the true parts together. I use the
get_main_type
method to obtain the main type and check it for equality with
'multipart'
; if equality holds, I skip this part
and move to the next one with a continue
statement. When I know I have a real part in hand, I locate its name
(or synthesize one if it has no name), open that name as a file, and
write the message’s contents (also known as the
message’s payload), which I get by calling the
get_payload
method, into the file. I use the decode=1
argument
to ensure that the payload is decoded back to a binary content (e.g.,
an image, a sound file, a movie, etc.) if needed, rather than
remaining in text form. If the payload is not encoded,
decode=1
is innocuous, so I don’t
have to check before I pass it.
Recipe 10.11; documentation for the standard
library modules email
, smtplib
,
mimetypes
, base64
,
quopri
, and cStringIO
in the
Library Reference.
Get Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.