Sorting a List of Objects by an Attribute of the Objects

Credit: Yakov Markovitch, Nick Perkins

Problem

You have a list of objects that you need to sort according to one attribute of each object, as rapidly and portably as possible.

Solution

In this case, the obvious approach is concise, but quite slow:

def sort_by_attr_slow(seq, attr):
    def cmp_by_attr(x, y, attr=attr):
        return cmp(getattr(x, attr), getattr(y, attr))
    seq.sort(cmp_by_attr)

There is a faster way, with DSU:

def sort_by_attr(seq, attr):
    import operator

    intermed = map(None, map(getattr, seq, (attr,)*len(seq)),
        xrange(len(seq)), seq)
    intermed.sort(  )
    return map(operator.getitem, intermed, (-1,)*len(intermed))

def sort_by_attr_inplace(lst, attr):
    lst[:] = sort_by_attr(lst, attr)

Discussion

Sorting a list of objects by an attribute of each object is best done using the DSU idiom. Since this recipe uses only built-ins and doesn’t use explicit looping, it is quite fast. Moreover, the recipe doesn’t use any Python 2.0-specific features (such as zip or list comprehensions), so it can be used for Python 1.5.2 as well.

List comprehensions are neater, but this recipe demonstrates that you can do without them, if and when you desperately need portability to stone-age Python installations. Of course, the correct use of map can be tricky. Here, for example, when we build the auxiliary list intermed, we need to call the built-in function getattr on each item of sequence seq using the same string, attr, as the second argument in each case. To do this, in the inner call to map, we need a tuple in which attr is repeated as many times as there are items in seq. We build that tuple by the not immediately obvious expression:

(attr,)*len(seq)

which is len(seq) repetitions of the one-item tuple (attr,), whose only item is exactly attr.

If you do want to use list comprehensions, that’s easy, too. Just substitute the sort_by_attr of the recipe with the following alternative version:

def sort_by_attr2(seq, attr):
    intermed = [(getattr(seq[i], attr), i, seq[i]) for i in xrange(len(seq))]
    intermed.sort(  )
    return [ tup[-1] for tup in intermed ]

However, if this piece of code is run in a speed-critical bottleneck of your program, you should carefully measure performance. map is often surprisingly fast when compared to list comprehensions, at least when no lambda is involved in the map. The difference in performance is not huge either way, so it’s worth exploring only if this code is run in a speed-critical bottleneck.

Whether you use map or list comprehensions, the point of the DSU idiom is to gain both speed and flexibility (for example, making the sort a stable one, as we did here) when compared to passing a comparison function to the sort method. For a simpler application and a somewhat wider discussion of DSU, see Recipe 2.4.

Note that in addition to making the sort stable, putting the index i as the second item of each tuple in the auxiliary list intermed (through the insertion of xrange in the call to map or, more simply, that of i in the list-comprehension version sort_by_attr2) also serves a potentially crucial role here. It ensures that two objects (two items of seq) will never be compared, even if their values are the same for the attribute named attr, because even in that case their indexes will surely differ, and thus the lexicographic comparison of the tuples will never get all the way to comparing the tuples’ last items (the objects). Avoiding object comparison may save us from extremely slow operations or even from attempting forbidden ones. For example, when we sort a list of complex numbers by their real attributes, in Python 2.0 or later, we will get an exception if we try to compare two complex numbers directly, as no ordering is defined on complex numbers.

Python Cookbook by

Sorting a List of Objects by an Attribute of the Objects

Problem

Solution

Discussion

See Also

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly