Operating on Iterators

Credit: Sami Hangaslammi

Problem

You need to operate on iterators (including normal sequences) with the same semantics as normal sequence operations, except that lazy evaluation is a must, because some of the iterators involved could represent unbounded sequences.

Solution

Python 2.2 iterators are easy to handle via higher-order functions, and lazy evaluation (such as that performed by the xrange built-in function) can be generalized. Here are some elementary operations that include concatenating several iterators, terminating iteration when a function becomes false, terminating iteration after the first n values, and returning every nth result of an iterator:

from _ _future_ _ import generators

def itercat(*iterators):
    """ Concatenate several iterators into one. """
    for i in iterators:
        i = iter(i)
        for x in i:
            yield x

def iterwhile(func, iterator):
    """ Iterate for as long as func(value) returns true. """
    iterator = iter(iterator)
    while 1:
        next = iterator.next(  )
        if not func(next):
            raise StopIteration    # or: return
        yield next

def iterfirst(iterator, count=1):
    """ Iterate through 'count' first values. """
    iterator = iter(iterator)
    for i in xrange(count):
        yield iterator.next(  )

def iterstep(iterator, n):
    """ Iterate every nth value. """
    iterator = iter(iterator)
    while 1:
        yield iterator.next(  )
        # Skip n-1 values
        for dummy in range(n-1):
            iterator.next(  )

A bit less elementary, but still generally useful, are functions that transform an iterator’s output, not just selecting which values to return and which to skip, but actually changing the structure. For example, here is a function that bunches up an iterator’s results into a sequence of tuples, each of length count:

from _ _future_ _ import generators

def itergroup(iterator, count, keep_partial=1):
    """ Iterate in groups of 'count' values. If there aren't enough values for
    the last group, it's padded with None's, or discarded if keep_partial is
    passed as false. """
    iterator = iter(iterator)
    while 1:
        result = [None]*count
        for x in range(count):
            try: result[x] = iterator.next(  )
            except StopIteration:
                if x and keep_partial: break
                else: raise
        yield tuple(result)

And here are generalizations to lazy evaluation of the non-lazy existing built-in Python functions zip, map, filter, and reduce:

from _ _future_ _ import generators

def xzip(*iterators):
    """ Iterative (lazy) version of built-in 'zip' """
    iterators = map(iter, iterators)
    while 1:
        yield tuple([x.next(  ) for x in iterators])

def xmap(func, *iterators):
    """ Iterative (lazy) version of built-in 'map'. """
    iterators = map(iter, iterators)
    count = len(iterators)
    def values(  ):
        # map pads shorter sequences with None when they run out of values
        result = [None]*count
        some_ok = 0
        for i in range(count):
            if iterators[i] is not None:
                try: result[i] = iterators[i].next(  )
                except StopIteration: iterators[i] = None
                else: some_ok = 1
        if some_ok: return tuple(result)
        else: raise StopIteration
    while 1:
        args = values(  )
        if func is None: yield args
        else: yield func(*args)

def xfilter(func, iterator):
    """ Iterative version of built-in 'filter' """
    iterator = iter(iterator)
    while 1:
        next = iterator.next(  )
        if func(next):
            yield next

def xreduce(func, iterator, default=None):
    """ Iterative version of built-in 'reduce' """
    iterator = iter(iterator)
    try: prev = iterator.next(  )
    except StopIteration: return default
    single = 1
    for next in iterator:
        single = 0
        prev = func(prev, next)
    if single:
        return func(prev, default)
    return prev

Discussion

This recipe is a collection of small utility functions for iterators (all functions can also be used with normal sequences). Among other things, the module presented in this recipe provides generator (lazy) versions of the built-in sequence-manipulation functions. The generators can be combined to produce a more specialized iterator. This recipe requires Python 2.2 or later, of course.

The built-in sequence-manipulation functions zip, map, and filter are specified to return sequences (and the specifications cannot be changed for backward compatibility with versions of Python before 2.2, which lacked iterators); therefore, they cannot become lazy. However, it’s easy to write lazy iterator-based versions of these useful functions, as well as other iterator-manipulation functions, as exemplified in this recipe.

Of course, lazy evaluation is not terribly useful in certain cases. The semantics of reduce, for example, require that all of the sequence is evaluated anyway. While in some cases one could save some memory by looping through the sequence that the iterator yields, rather than expanding it, most often it will be more practical to use reduce(func, iterator) instead of the xreduce function presented in this recipe.

Lazy evaluation is most useful when the resulting iterator-represented sequence is used in contexts that may be able to use just a reasonably short prefix of the sequence, such as the zip function and the iterwhile and iterfirst functions in this recipe. In such cases, lazy evaluation enables free use of unbounded sequences (of course, the resulting program will terminate only if each unbounded sequence is used only in a context in which only a finite prefix of it is taken) and sequences of potentially humungous length.

See Also

Recipe 17.11 and Recipe 17.12 for other uses of iterators.

Get Python Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.