Credit: Alex Martelli
You have a file that includes long logical lines split over two or more physical lines, with backslashes to indicate that a continuation line follows. You want to process a sequence of logical lines, rejoining those split lines.
As usual, a class is the right way to wrap this functionality in Python 2.1:
class LogicalLines: def _ _init_ _(self, fileobj): # Ensure that we get a line-reading sequence in the best way possible: import xreadlines try: # Check if the file-like object has an xreadlines method self.seq = fileobj.xreadlines( ) except AttributeError: # No, so fall back to the xreadlines module's implementation self.seq = xreadlines.xreadlines(fileobj) self.phys_num = 0 # current index into self.seq (physical line number) self.logi_num = 0 # current index into self (logical line number) def _ _getitem_ _(self, index): if index != self.logi_num: raise TypeError, "Only sequential access supported" self.logi_num += 1 result = [] while 1: # Intercept IndexError, since we may have a last line to return try: # Let's see if there's at least one more line in self.seq line = self.seq[self.phys_num] except IndexError: # self.seq is finished, so break the loop if we have any # more data to return; else, reraise the exception, because # if we have no further data to return, we're finished too if result: break else: raise self.phys_num += 1 if line.endswith('\\\n'): result.append(line[:-2]) else: result.append(line) break return ''.join(result) # Here's an example function, showing off usage: def show_logicals(fileob, numlines=5): ll = LogicalLines(fileob) for l in ll: print "Log#%d, phys# %d: %s" % ( ll.logi_num, ll.phys_num, repr(l)) if ll.logi_num>numlines: break if _ _name_ _=='_ _main_ _': from cStringIO import StringIO ff = StringIO( r"""prima \ seconda \ terza quarta \ quinta sesta settima \ ottava """) show_logicals( ff )
This is another sequence-bunching problem, like Recipe 4.9. In Python 2.1, a class wrapper is the most
natural approach to getting reusable code for sequence-bunching
tasks. We need to support the sequence protocol ourselves and handle
the sequence protocol in the sequence we wrap. In Python 2.1 and
earlier, the sequence protocol is as follows: a sequence must be
indexable by successively larger integers (0, 1, 2, ...), and it must
raise an IndexError
as soon as an integer that is
too large is used as its index. So, if we need to work with Python
2.1 and earlier, we must behave this way ourselves and be prepared
for just such behavior from the sequence we are wrapping.
In Python 2.2, thanks to iterators, the sequence protocol is much
simpler. A call to the next
method of an iterator
yields its next item, and the iterator raises a
StopIteration
when it’s done.
Combined with a simple generator function that returns an iterator,
this makes sequence bunching and similar tasks far easier:
from _ _future_ _ import generators def logical_lines(fileobj): logical_line = [] for physical_line in fileobj: if physical_line.ends_with('\\\n'): logical_line.append(physical_line[:-2]) else: yield ''.join(logical_line)+physical_line logical_line = [] if logical_line: yield ''.join(logical_line)
Recipe 4.9; Perl Cookbook Recipe 8.1.
Get Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.