You want to validate dates in the traditional formats mm/dd/yy, mm/dd/yyyy, dd/mm/yy, and dd/mm/yyyy. You want to weed out invalid dates, such as February 31st.
Month before day:
DateTime foundDate; Match matchResult = Regex.Match(SubjectString, "^(?<month>[0-3]?[0-9])/(?<day>[0-3]?[0-9])/" + "(?<year>(?:[0-9]{2})?[0-9]{2})$"); if (matchResult.Success) { int year = int.Parse(matchResult.Groups["year"].Value); if (year < 50) year += 2000; else if (year < 100) year += 1900; try { foundDate = new DateTime(year, int.Parse(matchResult.Groups["month"].Value), int.Parse(matchResult.Groups["day"].Value)); } catch { // Invalid date } }
Day before month:
DateTime foundDate; Match matchResult = Regex.Match(SubjectString, "^(?<day>[0-3]?[0-9])/(?<month>[0-3]?[0-9])/" + "(?<year>(?:[0-9]{2})?[0-9]{2})$"); if (matchResult.Success) { int year = int.Parse(matchResult.Groups["year"].Value); if (year < 50) year += 2000; else if (year < 100) year += 1900; try { foundDate = new DateTime(year, int.Parse(matchResult.Groups["month"].Value), int.Parse(matchResult.Groups["day"].Value)); } catch { // Invalid date } }
Month before day:
@daysinmonth = (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31); $validdate = 0; if ($subject =~ m!^([0-3]?[0-9])/([0-3]?[0-9])/((?:[0-9]{2})?[0-9]{2})$!) { $month = $1; $day = $2; $year = $3; $year += 2000 if $year < 50; $year += 1900 if $year < 100; if ($month == 2 && $year % 4 == 0 && ($year % 100 != 0 || $year % 400 == 0)) { $validdate = 1 if $day >= 1 && $day <= 29; } elsif ($month >= 1 && $month <= 12) { $validdate = 1 if $day >= 1 && $day <= $daysinmonth[$month-1]; } }
Day before month:
@daysinmonth = (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31); $validdate = 0; if ($subject =~ m!^([0-3]?[0-9])/([0-3]?[0-9])/((?:[0-9]{2})?[0-9]{2})$!) { $day = $1; $month = $2; $year = $3; $year += 2000 if $year < 50; $year += 1900 if $year < 100; if ($month == 2 && $year % 4 == 0 && ($year % 100 != 0 || $year % 400 == 0)) { $validdate = 1 if $day >= 1 && $day <= 29; } elsif ($month >= 1 && $month <= 12) { $validdate = 1 if $day >= 1 && $day <= $daysinmonth[$month-1]; } }
Month before day:
^(?: # February (29 days every year) (?<month>0?2)/(?<day>[12][0-9]|0?[1-9]) | # 30-day months (?<month>0?[469]|11)/(?<day>30|[12][0-9]|0?[1-9]) | # 31-day months (?<month>0?[13578]|1[02])/(?<day>3[01]|[12][0-9]|0?[1-9]) ) # Year /(?<year>(?:[0-9]{2})?[0-9]{2})$
Regex options: Free-spacing |
Regex flavors: .NET |
^(?: # February (29 days every year) (0?2)/([12][0-9]|0?[1-9]) | # 30-day months (0?[469]|11)/(30|[12][0-9]|0?[1-9]) | # 31-day months (0?[13578]|1[02])/(3[01]|[12][0-9]|0?[1-9]) ) # Year /((?:[0-9]{2})?[0-9]{2})$
Regex options: Free-spacing |
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
^(?:(0?2)/([12][0-9]|0?[1-9])|(0?[469]|11)/(30|[12][0-9]|0?[1-9])|↵ (0?[13578]|1[02])/(3[01]|[12][0-9]|0?[1-9]))/((?:[0-9]{2})?[0-9]{2})$
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Day before month:
^(?: # February (29 days every year) (?<day>[12][0-9]|0?[1-9])/(?<month>0?2) | # 30-day months (?<day>30|[12][0-9]|0?[1-9])/(?<month>0?[469]|11) | # 31-day months (?<day>3[01]|[12][0-9]|0?[1-9])/(?<month>0?[13578]|1[02]) ) # Year /(?<year>(?:[0-9]{2})?[0-9]{2})$
Regex options: Free-spacing |
Regex flavors: .NET |
^(?: # February (29 days every year) ([12][0-9]|0?[1-9])/(0?2) | # 30-day months (30|[12][0-9]|0?[1-9])/([469]|11) | # 31-day months (3[01]|[12][0-9]|0?[1-9])/(0?[13578]|1[02]) ) # Year /((?:[0-9]{2})?[0-9]{2})$
Regex options: Free-spacing |
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
^(?:([12][0-9]|0?[1-9])/(0?2)|(30|[12][0-9]|0?[1-9])/([469]|11)|↵ (3[01]|[12][0-9]|0?[1-9])/(0?[13578]|1[02]))/((?:[0-9]{2})?[0-9]{2})$
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
There are essentially two ways to accurately validate dates with a regular expression. One method is to use a simple regex that merely captures groups of numbers that look like a month/day/year combination, and then use procedural code to check whether the date is correct. I used the first regex from the previous recipe that allows any number between 0 and 39 for the day and month. That makes it easy to change the format from mm/dd/yy to dd/mm/yy by changing which capturing group is treated as the month.
The main benefit of this method is that you can easily add
additional restrictions, such as limiting dates to certain periods.
Many programming languages provide specific support for dealing with
dates. The C# solution uses .NET’s DateTime
structure to check whether the date
is valid and return the date in a useful format, all in one
step.
The other method is to do everything with a regular expression. The solution is manageable, if we take the liberty of treating every year as a leap year. We can use the same technique of spelling out the alternatives as we did for the more final solutions presented with the preceding recipe.
The problem with using a single regular expression is that it no longer neatly captures the day and month in a single capturing group. We now have three capturing groups for the month, and three for the day. When the regex matches a date, only three of the seven groups in the regex will actually capture something. If the month is February, groups 1 and 2 capture the month and day. If the month has 30 days, groups 3 and 4 return the month and day. If the month has 31 days, groups 5 and 6 take action. Group 7 always captures the year.
Only the .NET regex flavor helps us in this situation. .NET allows multiple named capturing groups (see Recipe 2.11) to have the same name, and uses the same storage space for groups with the same name. If you use the .NET-only solution with named capture, you can simply retrieve the text matched by the groups “month” and “day” without worrying about how many days the month has. All the other flavors discussed in this book either don’t support named capture, don’t allow two groups to have the same name, or return only the text captured by the last group with any given name. For those flavors, numbered capture is the only way to go.
The pure regex solution is interesting only in situations where one regex is all you can use, such as when you’re using an application that offers one box to type in a regex. When programming, make things easier with a bit of extra code. This will be particularly helpful if you want to add extra checks on the date later. Here’s a pure regex solution that matches any date between 2 May 2007 and 29 August 2008 in d/m/yy or dd/mm/yyyy format:
# 2 May 2007 till 29 August 2008 ^(?: # 2 May 2007 till 31 December 2007 (?: # 2 May till 31 May (?<day>3[01]|[12][0-9]|0?[2-9])/(?<month>0?5)/(?<year>2007) | # 1 June till 31 December (?: # 30-day months (?<day>30|[12][0-9]|0?[1-9])/(?<month>0?[69]|11) | # 31-day months (?<day>3[01]|[12][0-9]|0?[1-9])/(?<month>0?[78]|1[02]) ) /(?<year>2007) ) | # 1 January 2008 till 29 August 2008 (?: # 1 August till 29 August (?<day>[12][0-9]|0?[1-9])/(?<month>0?8)/(?<year>2008) | # 1 Janary till 30 June (?: # February (?<day>[12][0-9]|0?[1-9])/(?<month>0?2) | # 30-day months (?<day>30|[12][0-9]|0?[1-9])/(?<month>0?[46]) | # 31-day months (?<day>3[01]|[12][0-9]|0?[1-9])/(?<month>0?[1357]) ) /(?<year>2008) ) )$
Regex options: Free-spacing |
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
Get Regular Expressions Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.