2.12. Repeat Part of the Regex a Certain Number of Times

Problem

Create regular expressions that match the following kinds of numbers:

  • A googol (a decimal number with 100 digits).

  • A 32-bit hexadecimal number.

  • A 32-bit hexadecimal number with an optional h suffix.

  • A floating-point number with an optional integer part, a mandatory fractional part, and an optional exponent. Each part allows any number of digits.

Solution

Googol

\b\d{100}\b
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Hexadecimal number

\b[a-f0-9]{1,8}\b
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Hexadecimal number with optional suffix

\b[a-f0-9]{1,8}h?\b
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Floating-point number

\d*\.\d+(e\d+)?
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

Fixed repetition

The quantifier {n}, where n is a nonnegative integer, repeats the preceding regex token n number of times. The \d{100} in \b\d{100}\b matches a string of 100 digits. You could achieve the same by typing \d 100 times.

{1} repeats the preceding token once, as it would without any quantifier. ab{1}c is the same regex as abc.

{0} repeats the preceding token zero times, essentially deleting it from the regular expression. ab{0}c is the same regex as ac.

Variable repetition

For variable repetition, we use the quantifier {n,m}, where ...

Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.