Credit: Yakov Markovitch, Nick Perkins
You have a list of objects that you need to sort according to one attribute of each object, as rapidly and portably as possible.
In this case, the obvious approach is concise, but quite slow:
def sort_by_attr_slow(seq, attr): def cmp_by_attr(x, y, attr=attr): return cmp(getattr(x, attr), getattr(y, attr)) seq.sort(cmp_by_attr)
There is a faster way, with DSU:
def sort_by_attr(seq, attr): import operator intermed = map(None, map(getattr, seq, (attr,)*len(seq)), xrange(len(seq)), seq) intermed.sort( ) return map(operator.getitem, intermed, (-1,)*len(intermed)) def sort_by_attr_inplace(lst, attr): lst[:] = sort_by_attr(lst, attr)
Sorting a list of objects by an attribute of each object is best done
using the DSU idiom. Since this recipe uses only built-ins and
doesn’t use explicit looping, it is quite fast.
Moreover, the recipe doesn’t use any Python
2.0-specific features (such as zip
or list
comprehensions), so it can be used for Python 1.5.2 as well.
List comprehensions are neater, but this recipe demonstrates that you
can do without them, if and when you desperately need portability to
stone-age Python installations. Of course, the correct use of
map
can be tricky. Here, for example, when we
build the auxiliary list intermed
, we need to call
the built-in function getattr
on each item of
sequence seq
using the same string,
attr
, as the second argument in each case. To do
this, in the inner call to
map
, we need a tuple in
which attr
is repeated as many times as there are
items in seq
. We build that tuple by the not
immediately obvious expression:
(attr,)*len(seq)
which is len(seq)
repetitions of the one-item
tuple (attr,)
, whose only item is exactly
attr
.
If you do want to use list comprehensions, that’s
easy, too. Just substitute the sort_by_attr
of the
recipe with the following alternative version:
def sort_by_attr2(seq, attr): intermed = [(getattr(seq[i], attr), i, seq[i]) for i in xrange(len(seq))] intermed.sort( ) return [ tup[-1] for tup in intermed ]
However, if this piece of code is run in a speed-critical bottleneck
of your program, you should carefully measure performance.
map
is often surprisingly fast when compared to
list comprehensions, at least when no lambda
is
involved in the map
. The difference in performance
is not huge either way, so it’s worth exploring only
if this code is run in a speed-critical bottleneck.
Whether you use map
or list comprehensions, the
point of the DSU idiom is to gain both speed and flexibility (for
example, making the sort a stable one, as we did here) when compared
to passing a comparison function to the sort
method. For a simpler application and a somewhat wider discussion of
DSU, see Recipe 2.4.
Note that in addition to making the sort stable, putting the index
i
as the second item of each tuple in the
auxiliary list intermed
(through the insertion of
xrange
in the call to map
or,
more simply, that of i
in the list-comprehension
version sort_by_attr2
) also serves a potentially
crucial role here. It ensures that two objects (two items of
seq
) will never be compared, even if their values
are the same for the attribute named attr
, because
even in that case their indexes will surely differ, and thus the
lexicographic comparison of the tuples will never get all the way to
comparing the tuples’ last items (the objects).
Avoiding object comparison may save us from extremely slow operations
or even from attempting forbidden ones. For example, when we sort a
list of complex numbers by their real
attributes,
in Python 2.0 or later, we will get an exception if we try to compare
two complex numbers directly, as no ordering is defined on complex
numbers.
Get Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.