This example reads through a file using a
ReaderCharacterIterator
, one of four
CharacterIterator
classes in the Jakarta RegExp
package. Whenever a match is found, I extract it from the
CharacterIterator
and print it.
The other character iterators are
StreamCharacterIterator
(as we’ll see in Chapter 9, streams are 8-bit bytes, while readers handle
conversion among various representations of
Unicode
characters), CharacterArrayIterator
, and
StringCharacterIterator
. All of these character
iterators are interchangeable; apart from the construction process,
this program would work on any of them. Use a
StringCharacterIterator
, for example, to find all
occurrences of a pattern in the (possibly long) string you get from a
JTextArea
’s getText( )
method, described in Chapter 13.
This code takes the getParen( )
methods from Section 4.6, the substring
method from
the CharacterIterator
interface, and the
match( )
method from the RE, and simply puts
them all together. I coded it to extract all the “names”
from a given file; in running the program through itself, it prints
the words “import”, “org”,
“apache”, “regexp”, and so on.
> jikes +E -d . ReaderIter.java > java ReaderIter ReaderIter.java import org apache regexp import java io import com darwinsys util Debug Demonstrate the Character Iterator interface print
I interrupted it here to save paper. The source code for this program is fairly short:
import org.apache.regexp.*; import java.io.*; import com.darwinsys.util.Debug; /** Demonstrate the CharacterIterator interface: print * all the strings that match a given pattern from a file. */ public class ReaderIter { public static void main(String[] args) throws Exception { // The RE pattern RE patt = new RE("[A-Za-z][a-z]+"); // A FileReader (see the I/O chapter) Reader r = new FileReader(args[0]); // The RE package ReaderCharacterIterator, a "front end" // around the Reader object. CharacterIterator in = new ReaderCharacterIterator(r); int end = 0; // For each match in the input, extract and print it. while (patt.match(in, end)) { // Get the starting position of the text int start = patt.getParenStart(0); // Get ending position; also updates for NEXT match. end = patt.getParenEnd(0); // Print whatever matched. Debug.println("match", "start=" + start + "; end=" + end); // Use CharacterIterator.substring(offset, end); System.out.println(in.substring(start, end)); } } }
Get Java Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.