Credit: Alex Martelli, David Goodger
You need to sort a Python list in a guaranteed-stable way (i.e., with no alteration in the relative ordering of items that compare equal).
In addition to speed, decorate-sort-undecorate
(DSU) offers flexibility that sort
with a
comparison function argument just can’t match. For
example, you can ensure the sort’s stability, as
follows:
def stable_sorted_copy(alist, _indices=xrange(sys.maxint)): # Decorate: prepare a suitable auxiliary list decorated = zip(alist, _indices) # Sort: do a plain built-in sort on the auxiliary list decorated.sort( ) # Undecorate: extract the list from the sorted auxiliary list return [ item for item, index in decorated ] def stable_sort_inplace(alist): # To sort in place: assign sorted result to all-list slice of original list alist[:] = stable_sorted_copy(alist)
The notion of a stable sort is typically not relevant if the sequences you are sorting contain objects that are uniform in type and are simple, such as a list of integers. In such cases, objects that compare equal are equal in all measurable ways. However, in cases where equality for the sake of ordering doesn’t correspond to “deep” equality—such as tuples containing floating-point and integer numbers, or, more commonly, rich objects with arbitrary internal structure—it may matter that the elements that start off earlier in the list than their “equal” counterparts remain earlier after the sort.
Python lists’ sort
method is not
guaranteed to be stable: items that compare equal may or may not be
in unchanged order (they often are, but you cannot be sure). Ensuring
stability is easy, as one of the many applications of the common DSU
idiom. For another specific example of DSU usage, see Recipe 2.7.
First, you build an auxiliary list (the decorate step), where each item is a tuple made up of all sort keys, in descending order of significance, of the corresponding item of the input sequence. You must include in each tuple of the auxiliary list all of the information in the corresponding item, and/or an index to it, so that you can fetch the item back, or reconstruct it, in the third step.
In the second step, you sort the
auxiliary list by its built-in sort
method,
without arguments (this is crucial for performance).
Finally, you reconstruct the desired
sorted list by undecorating the now-sorted auxiliary list. Steps 1
and 3 can be performed in several ways: with map
,
zip
, list comprehensions, or explicit loops. List
comprehensions and zip
are generally simpler,
although they require Python 2.0 or later. If you need to be
compatible with Python 1.5.2, you will have to use
map
, which unfortunately is often less clear and
readable.
This idiom is also known as the “Schwartzian
transform,” by analogy with a related Perl idiom.
However, that term implies using Perl’s
map
and grep
functions and
performing the whole idiom inside of a single statement.
DSU inherently supplies a sorted copy, but if the input sequence is a list, we can just assign to its include-everything slice to get an in-place effect instead.
This recipe specifically demonstrates using DSU to achieve a stable
sort (i.e., a sort where items that compare equal keep the same
relative order in the result list as they had in the input sequence).
For this specific task, passing an argument to the built-in
sort
method is of no use. More generally, the DSU
idiom can sometimes be replaced by passing a comparison function
argument to sort
, but DSU tends to be much faster,
and speed often matters when you are sorting sequences that
aren’t tiny.
The speed comes from maximally accelerating (by not using a
Python-coded function as an argument to sort
) the
O(N log
N) part, which dominates sorting time for
sequences of substantial length N. The
decoration and undecoration steps are both
O(N), and thus they
contribute negligible time if N is large enough
and reasonably little time even for many practical values of
N.
Note the named argument _indices
. This ensures
that a single copy of the necessary xrange
object
is generated at function-definition time. It can then be reused to
decorate any argument sequence with indexes, by exploiting
zip
’s truncation behavior on
unequal-length arguments (see Recipe 1.14).
Recipe 1.14 and Recipe 2.7.
Get Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.