4.6. Validate Traditional Time Formats

Problem

You want to validate times in various traditional time formats, such as hh:mm and hh:mm:ss in both 12-hour and 24-hour formats.

Solution

Hours and minutes, 12-hour clock:

^(1[0-2]|0?[1-9]):([0-5]?[0-9])$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Hours and minutes, 24-hour clock:

^(2[0-3]|[01]?[0-9]):([0-5]?[0-9])$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Hours, minutes and seconds, 12-hour clock:

^(1[0-2]|0?[1-9]):([0-5]?[0-9]):([0-5]?[0-9])$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Hours, minutes and seconds, 24-hour clock:

^(2[0-3]|[01]?[0-9]):([0-5]?[0-9]):([0-5]?[0-9])$

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

The question marks in all of the preceding regular expressions make leading zeros optional. Remove the question marks to make leading zeros mandatory.

Discussion

Validating times is considerably easier than validating dates. Every hour has 60 minutes, and every minute has 60 seconds. This means we don’t need any complicated alternations in the regex. For the minutes and seconds, we don’t use alternation at all. ‹[0-5]?[0-9]› matches a digit between 0 and 5, followed by a digit between 0 and 9. This correctly matches any number between 0 and 59. The question mark after the first character class makes it optional. This way, a single digit between 0 and 9 is also accepted as a valid minute or second. Remove the question mark if the first 10 minutes and seconds should be written as 00 to 09. See Recipe 2.3 and Recipe 2.12 for details on character classes and quantifiers such as the question mark.

For the hours, we do use alternation (see Recipe 2.8). The second digit allows different ranges, depending on the first digit. On a 12-hour clock, if the first digit is 0, the second digit allows all 10 digits, but if the first digit is 1, the second digit must be 0, 1, or 2. In a regular expression, we write this as ‹1[0-2]|0?[1-9]›. On a 24-hour clock, if the first digit is 0 or 1, the second digit allows all 10 digits, but if the first digit is 2, the second digit must be between 0 and 3. In regex syntax, this can be expressed as 2[0-3]|[01]?[0-9]. Again, the question mark allows the first 10 hours to be written with a single digit. Remove the question mark to require two digits.

We put parentheses around the parts of the regex that match the hours, minutes, and seconds. That makes it easy to retrieve the digits for the hours, minutes, and seconds without the colons. Recipe 2.9 explains how parentheses create capturing groups. Recipe 3.9 explains how you can retrieve the text matched by those capturing groups in procedural code.

The parentheses around the hour part keeps two alternatives for the hour together. If you remove those parentheses, the regex won’t work correctly. Removing the parentheses around the minutes and seconds has no effect, other than making it impossible to retrieve their digits separately.

Variations

If you want to search for times in larger bodies of text instead of checking whether the input as a whole is a time, you cannot use the anchors ‹^› and ‹$›. Merely removing the anchors from the regular expression is not the right solution. That would allow the hour and minute regexes to match 12:12 within 9912:1299, for instance. Instead of anchoring the regex match to the start and end of the subject, you have to specify that the date cannot be part of longer sequences of digits.

This is easily done with a pair of word boundaries. In regular expressions, digits are treated as characters that can be part of words. Replace both ‹^› and ‹$› with ‹\b›. As an example:

\b(2[0-3]|[01]?[0-9]):([0-5]?[0-9])\b

Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Word boundaries don’t disallow everything; they only disallow letters, digits and underscores. The regex just shown, which matches hours and minutes on a 24-hour clock, matches 16:08 within the subject text The time is 16:08:42 sharp. The space is not a word character, whereas the 1 is, so the word boundary matches between them. The 8 is a word character, whereas the colon isn’t, so ‹\b› also matches between those two.

If you want to disallow colons as well as word characters, you need to use lookaround (see Recipe 2.16). The following regex will not match any part of The time is 16:08:42 sharp. It only works with flavors that support lookbehind:

(?<![:\w])(2[0-3]|[01]?[0-9]):([0-5]?[0-9])(?![:\w])

Regex options: None

Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby 1.9

Regular Expressions Cookbook by

4.6. Validate Traditional Time Formats

Problem

Solution

Discussion

Variations

See Also

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly