7.13. Extracting the Query from a URL
Problem
You want to extract the query from a string that holds a URL.
For example, you want
to extract param=value
from http://www.regexcookbook.com?param=value
or from /index.html?param=value
.
Solution
^[^?#]+\?([^#]+)
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
Extracting the query from a URL is trivial if you know that your
subject text is a valid URL. The query is delimited from the part of
the URL before it with a question mark. That is the first question
mark allowed anywhere in URLs. Thus, we can easily skip ahead to the
first question mark with ‹^[^?#]+\?
›. The question mark is a metacharacter
only outside character classes, but not inside, so we escape the
literal question mark outside the character class. The first ‹^
› is an anchor (Recipe 2.5), whereas the second ‹^
› negates the character class
(Recipe 2.3).
Question marks can appear in URLs as part of the (optional)
fragment after the query. So we do need to use ‹^[^?#]+\?
›, rather than just
‹\?
›, to make sure we
have the first question mark in the URL, and make sure that it isn’t
part of the fragment in a URL without a query.
The query runs until the start of the fragment, or the end of
the URL if there is no fragment. The fragment is delimited from the
rest of the URL with a hash sign. Since hash signs are not permitted
anywhere except in the fragment, ‹[^#]+
› is all we need to match the query. The negated character class matches ...
Get Regular Expressions Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.