Chapter 3. Regular Expressions
A regular expression, or
regexp, is a way of describing a set of strings. Because regular
expressions are such a fundamental part of awk
programming, their format and use deserve a
separate chapter.
A regular expression enclosed in slashes (‘/
’) is an awk
pattern that matches every input record whose text belongs to that
set. The simplest regular expression is a sequence of letters, numbers, or
both. Such a regexp matches any string that contains that sequence. Thus,
the regexp ‘foo
’ matches any string
containing ‘foo
.’ Thus, the pattern
/foo/
matches any input record containing
the three adjacent characters ‘foo
’
anywhere in the record. Other kinds of regexps let you
specify more complicated classes of strings.
Initially, the examples in this chapter are simple. As we explain more about how regular expressions work, we present more complicated instances.
How to Use Regular Expressions
A regular expression can be used as a pattern by enclosing it in
slashes. Then the regular expression is tested against the entire
text of each record. (Normally, it only needs to match some part of the
text in order to succeed.) For example, the following prints the second
field of each record where the string ‘li
’ appears anywhere in the record:
$ awk '/li/ { print $2 }' mail-list
555-5553
555-0542
555-6699
555-3430
Regular expressions can also be used in matching expressions. These expressions allow you to specify the string to match against; it need not be the entire current ...
Get Effective awk Programming, 4th Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.