Credit: Luther Blissett
This is best handled by two nested loops, one on lines and one on the words in each line:
for line in open(thefilepath).xreadlines( ): for word in line.split( ): dosomethingwith(word)
This implicitly defines words as sequences of nonspaces separated by
sequences of spaces (just as the Unix program wc
does). For other definitions of words, you can use regular
expressions. For example:
import re re_word = re.compile(r'[\w-]+') for line in open(thefilepath).xreadlines( ): for word in re_word.findall(line): dosomethingwith(word)
In this case, a word is defined as a maximal sequence of alphanumerics and hyphens.
For other definitions of words you will obviously need different
regular expressions. The outer loop, on all lines in the file, can of
course be done in many ways. The xreadlines
method
is good, but you can also use the list obtained by the
readlines
method, the standard library module
fileinput
, or, in Python 2.2, even just:
for line in open(thefilepath):
which is simplest and fastest.
In Python 2.2, it’s often a good idea to wrap iterations as iterator objects, most commonly by simple generators:
from _ _future_ _ import generators def words_of_file(thefilepath): for line in open(thefilepath): for word in line.split( ): yield word for word in words_of_file(thefilepath): dosomethingwith(word)
This approach lets you separate, cleanly and effectively, two
different concerns: how to iterate over all items (in this case,
words in a file) and what to do with each item in the iteration. Once
you have cleanly encapsulated iteration concerns in an iterator
object (often, as here, a generator), most of your uses of iteration
become simple for
statements. You can often reuse
the iterator in many spots in your program, and if maintenance is
ever needed, you can then perform it in just one place—the
definition of the iterator—rather than having to hunt for all
uses. The advantages are thus very similar to those you obtain, in
any programming language, by appropriately defining and using
functions rather than copying and pasting pieces of code all over the
place. With Python 2.2’s iterators, you can get
these advantages for looping control structures, too.
Documentation for the fileinput
module in the
Library Reference; PEP 255 on simple generators
(http://www.python.org/peps/pep-0255.html);
Perl Cookbook Recipe 8.3.
Get Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.