Unpacking a Multipart MIME Message

Credit: Matthew Dixon Cowles

Problem

You have a multipart MIME message and want to unpack it.

Solution

The walk method of message objects generated by the email module (new as of Python 2.2) makes this task really easy:

import email.Parser
import os, sys

def main(  ):
    if len(sys.argv)==1:
        print "Usage: %s filename" % os.path.basename(sys.argv[0])
        sys.exit(1)

    mailFile = open(sys.argv[1], "rb")
    p = email.Parser.Parser(  )
    msg = p.parse(mailFile)
    mailFile.close(  )

    partCounter = 1
    for part in msg.walk(  ):
        if part.get_main_type(  )=="multipart":
            continue
        name = part.get_param("name")
        if name==None:
            name = "part-%i" % partCounter
        partCounter+=1
        # In real life, make sure that name is a reasonable filename 
        # for your OS; otherwise, mangle it until it is!
        f = open(name,"wb")
        f.write(part.get_payload(decode=1))
        f.close(  )
        print name

if _ _name_ _=="_ _main_ _":
    main(  )

Discussion

The email module, new in Python 2.2, makes parsing MIME messages reasonably easy. (See the Library Reference for detailed documentation about the email module.) This recipe shows how to recursively unbundle a MIME message with the email module in the easiest way, using the walk method of message objects.

You can create a message object in several ways. For example, you can instantiate the email.Message.Message class and build the message object’s contents with calls to its add_payload method. In this recipe, I need to read and analyze an existing message, so I worked the other way around, calling the parse method of an email.Parser.Parser instance. The parse method takes as its only argument a file-like object (in the recipe, I pass it a real file object that I just opened for binary reading with the built-in open function) and returns a message object, on which you can call message object methods.

The walk method is a generator, i.e., it returns an iterator object on which you can loop with a for statement. Usually, you will use this method exactly as I use it in this recipe:

for part in msg.walk(  ):

The iterator sequentially returns (depth-first, in case of nesting) the parts that comprise the message. If the message is not a container of parts (has no attachments or alternates, i.e., message.is_multipart( ) is false), no problem: the walk method will return an iterator with a single element: the message itself. In any case, each element of the iterator is also a message object (an instance of email.Message.Message), so you can call on it any of the methods a message object supplies.

In a multipart message, parts with a type of 'multipart/something' (i.e., a main type of 'multipart') may be present. In this recipe, I skip them explicitly since they’re just glue holding the true parts together. I use the get_main_type method to obtain the main type and check it for equality with 'multipart'; if equality holds, I skip this part and move to the next one with a continue statement. When I know I have a real part in hand, I locate its name (or synthesize one if it has no name), open that name as a file, and write the message’s contents (also known as the message’s payload), which I get by calling the get_payload method, into the file. I use the decode=1 argument to ensure that the payload is decoded back to a binary content (e.g., an image, a sound file, a movie, etc.) if needed, rather than remaining in text form. If the payload is not encoded, decode=1 is innocuous, so I don’t have to check before I pass it.

See Also

Recipe 10.11; documentation for the standard library modules email, smtplib, mimetypes, base64, quopri, and cStringIO in the Library Reference.

Get Python Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.