BUY THIS BOOK

Safari Books Online

What is this?

Looking to Reprint this content?


sed, awk and Regular Expressions Pocket Reference
sed, awk and Regular Expressions Pocket Reference

By Arnold Robbins

Cover | Table of Contents


Table of Contents

Chapter 1: sed & awk Pocket Reference
This pocket reference is a companion volume to O'Reilly's sed & awk, Second Edition, by Dale Dougherty and Arnold Robbins. It presents a concise summary of regular expressions and pattern matching, and summaries of sed and awk.
The pocket reference follows certain typographic conventions, outlined here:
Constant Width
Is used for code examples, commands, directory names, filenames, and options.
Constant Width Italic
Is used in syntax and command summaries to show replaceable text; this text should be replaced with user-supplied values.
Constant Width Bold
Is used in code examples to show commands or other text that should be typed literally by the user.
Italic
Is used to show generic arguments and options; these should be replaced with user-supplied values. Italic is also used to highlight comments in examples.
$
Is used in some examples as the Bourne shell or Korn shell prompt.
[ ]
Surround optional elements in a description of syntax. (The brackets themselves should never be typed.)
A number of Unix text-processing utilities let you search for, and in some cases change, text patterns rather than fixed strings. These utilities include the editing programs ed,
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Introduction
This pocket reference is a companion volume to O'Reilly's sed & awk, Second Edition, by Dale Dougherty and Arnold Robbins. It presents a concise summary of regular expressions and pattern matching, and summaries of sed and awk.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Conventions
The pocket reference follows certain typographic conventions, outlined here:
Constant Width
Is used for code examples, commands, directory names, filenames, and options.
Constant Width Italic
Is used in syntax and command summaries to show replaceable text; this text should be replaced with user-supplied values.
Constant Width Bold
Is used in code examples to show commands or other text that should be typed literally by the user.
Italic
Is used to show generic arguments and options; these should be replaced with user-supplied values. Italic is also used to highlight comments in examples.
$
Is used in some examples as the Bourne shell or Korn shell prompt.
[ ]
Surround optional elements in a description of syntax. (The brackets themselves should never be typed.)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Matching Text
A number of Unix text-processing utilities let you search for, and in some cases change, text patterns rather than fixed strings. These utilities include the editing programs ed, ex, vi, and sed, the awk programming language, and the commands grep and egrep. Text patterns (formally called regular expressions) contain normal characters mixed with special characters (called metacharacters).
This section presents the following topics:
  • Filenames versus patterns
  • List of metacharacters available to each program
  • Description of metacharacters
  • Examples
Metacharacters used in pattern matching are different from metacharacters used for filename expansion. When you issue a command on the command line, special characters are seen first by the shell, then by the program; therefore, unquoted metacharacters are interpreted by the shell for filename expansion. The command:
$ grep [A-Z]* chap[12]
            
could, for example, be transformed by the shell into:
$ grep Array.c Bug.c Comp.c chap1 chap2
            
and would then try to find the pattern Array.c in files Bug.c, Comp.c, chap1, and chap2. To bypass the shell and pass the special characters to grep, use quotes:
$ grep "[A-Z]*" chap[12]
            
Double quotes suffice in most cases, but single quotes are the safest bet.
Note also that in pattern matching, ? matches zero or one instance of a regular expression; in filename expansion, ? matches a single character.

Section 1.3.2.1: Search patterns

The characters in the following table have special meaning only in search patterns:
CharacterPattern
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The sed Editor
This section presents the following topics:
  • Conceptual overview of sed
  • Command-line syntax
  • Syntax of sed commands
  • Group summary of sed commands
  • Alphabetical summary of sed commands
sed is a non-interactive, or stream-oriented, editor. It interprets a script and performs the actions in the script. sed is stream-oriented because, like many Unix programs, input flows through the program and is directed to standard output. For example, sort is stream-oriented; vi is not. sed's input typically comes from a file or pipe, but it can also be directed from the keyboard. Output goes to the screen by default but can be captured in a file or sent through a pipe instead.
The Free Software Foundation has a version of sed, available from ftp://gnudist.gnu.org/gnu/sed/sed-3.02.tar.gz. The somewhat older version, 2.05, is also available.
Typical uses of sed include:
  • Editing one or more files automatically
  • Simplifying repetitive edits to multiple files
  • Writing conversion programs
sed operates as follows:
  • Each line of input is copied into a "pattern space," an internal buffer where editing operations are performed.
  • All editing commands in a sed script are applied, in order, to each line of input.
  • Editing commands are applied to all lines (globally) unless line addressing restricts the lines affected.
  • If a command changes the input, subsequent commands and address tests will be applied to the current line in the pattern space, not the original input line.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The awk Programming Language
This section presents the following topics:
  • Conceptual overview
  • Command-line syntax
  • Patterns and procedures
  • Built-in variables
  • Operators
  • Variables and array assignment
  • User-defined functions
  • Group listing of functions and commands
  • Implementation limits
  • Alphabetical summary of functions and commands
awk is a pattern-matching program for processing files, especially when they are databases. The new version of awk, called nawk, provides additional capabilities. (It really isn't so new. The additional features were added in 1984, and it was first shipped with System V Release 3.1 in 1987. Nevertheless, the name was never changed on most systems.) Every modern Unix system comes with a version of new awk, and its use is recommended over old awk.
Different systems vary in what the two versions are called. Some have oawk and awk, for the old and new versions, respectively. Others have awk and nawk. Still others only have awk, which is the new version. This example shows what happens if your awk is the old one:
$ awk 1 /dev/null
awk: syntax error near line 1
awk: bailing out near line 1
awk will exit silently if it is the new version.
Source code for the latest version of awk, from Bell Labs, can be downloaded starting at Brian Kernighan's home page: http://cm.bell-labs.com/~bwk. Michael Brennan's mawk is available via anonymous FTP from ftp://ftp.whidbey.net/pub/brennan/mawk1.3.3.tar.gz. Finally, the Free Software Foundation has a version of awk called gawk, available from ftp://gnudist.gnu.org/gnu/gawk/gawk-3.0.4.tar.gz
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Return to sed, awk and Regular Expressions Pocket Reference