Combined Log Format
Problem
You need a regular expression that matches each line in the log files produced by a web server that uses the Combined Log Format.[14] For example:
127.0.0.1 - jg
[27/Apr/2012:11:27:36 +0700] "GET /regexcookbook.html HTTP/1.1" 200 2326
"http://www.regexcookbook.com/" "Mozilla/5.0 (compatible; MSIE 9.0;
Windows NT 6.1; Trident/5.0)"
Solution
^(?<client>\S+)●\S+●(?<userid>\S+)●\[(?<datetime>[^\]]+)\]↵ ●"(?<method>[A-Z]+)●(?<request>[^●"]+)?●HTTP/[0-9.]+"↵ ●(?<status>[0-9]{3})●(?<size>[0-9]+|-)●"(?<referrer>[^"]*)"↵ ●"(?<useragent>[^"]*)"
Regex options: ^ and $ match at line breaks |
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9 |
^(?P<client>\S+)●\S+●(?P<userid>\S+)●\[(?P<datetime>[^\]]+)\]↵ ●"(?P<method>[A-Z]+)●(?P<request>[^●"]+)?●HTTP/[0-9.]+"↵ ●(?P<status>[0-9]{3})●(?P<size>[0-9]+|-)●"(?P<referrer>[^"]*)"↵ ●"(?P<useragent>[^"]*)"
Regex options: ^ and $ match at line breaks |
Regex flavors: PCRE 4, Perl 5.10, Python |
^(\S+)●\S+●(\S+)●\[([^\]]+)\]●"([A-Z]+)●([^●"]+)?●HTTP/[0-9.]+"↵ ●([0-9]{3})●([0-9]+|-)●"([^"]*)"●"([^"]*)"●"([^"]*)"●"([^"]*)"
Regex options: ^ and $ match at line breaks |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
The Combined Log Format is the same as the Common Log Format, but
with two extra fields added at the end of each entry, and the first
extra field is the referring URL. The second extra field is the user
agent. Both appear as double-quoted strings. We can easily match those
strings with ‹"[^"]*"
›
Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.