Chapter 4. Designing URIs
URIs are identifiers of resources that work across the Web.
A URI consists of a scheme (such as http
and
https
), a host (such as www.example.org
), a port
number followed by a path with one or more segments (such as
/users/1234
), and a query string. In this chapter, our focus
is on designing URIs for RESTful web services:
- Recipe 4.1
Use this recipe to learn some commonly practiced URI design conventions.
- Recipe 4.2
Use this recipe to learn some dos and don’ts to keep URIs as opaque identifiers.
- Recipe 4.3
Treating URIs as opaque identifiers helps decouple clients from servers. This recipe shows techniques that the server can employ to help clients treat URIs as opaque.
- Recipe 4.4
Since URIs are a key part of the interface between clients and servers, it is important to keep them “cool,” i.e., stable and permanent. Use this recipe to learn some practices to help keep URIs cool.
4.1. How to Design URIs
URIs are opaque resource identifiers. In most cases, clients need not be concerned with how a server designs its URIs. However, following common conventions when designing URIs has several advantages:
URIs that support convention are usually easy to debug and manage.
Servers can centralize code to extract data from request URIs.
You can avoid spending valuable design and implementation time inventing new conventions and rules for processing URIs.
Partitioning the server’s URIs across domains, subdomains, and paths gives you operational flexibility for load distribution, monitoring, routing, and security.
Problem
You want to know the best practices to design URIs for resources.
Solution
Use domains and subdomains to logically group or partition resources for localization, distribution, or to enforce various monitoring or security policies.
Use the forward-slash separator (
/
) in the path portion of the URI to indicate a hierarchical relationship between resources.Use the comma (
,
) and semicolon (;
) to indicate nonhierarchical elements in the path portion of the URI.Use the hyphen (
-
) and underscore (_
) characters to improve the readability of names in long path segments.Use the ampersand (
&
) to separate parameters in the query portion of the URI.Avoid including file extensions (such as .php, .aspx, and .jsp) in URIs.
Discussion
URI design is just one aspect of implementing RESTful applications. Here are some conventions to consider when designing URIs.
Warning
As important as URI design is to the success of your web service, it is just as important to keep the time spent in URI design to a minimum. Focus on consistency of URIs instead.
Domains and subdomains
A logical partition of URIs into domains and subdomains provides several operational benefits for server administration. Make sure to use logical names for subdomains while partitioning URIs. For example, the server could offer localized representations via different subdomains, as in the following:
http://en.example.org/book/1234 http://da.example.org/book/1234 http://fr.example.org/book/1234
Another example is, partition based on the class of clients.
http://www.example.org/book/1234 http://api.example.org/book/1234
In this example, the server offers two subdomains, one for browsers and the other for custom clients. Such partitioning may let the server allocate different hardware or apply different routing, monitoring, or security policies for HTML and non-HTML representations.
Forward-slash separator
By convention, the forward slash (/
) character is
used to convey hierarchical relationships. This is not a hard and
fast rule, but most users assume this when they scan URIs. In fact,
the forward slash is the only character mentioned in RFC 3986 as
typically indicating a hierarchical relationship. For example, all
the following URIs convey a hierarchical association between path
segments:
http://www.example.org/messages/msg123 http://www.example.org/customer/orders/order1 http://www.example.org/earth/north-america/canada/manitoba
Some web services may use a trailing forward slash for collection resources. Use such conventions with care since some development frameworks may incorrectly remove such slashes or add trailing slashes during URI normalization.
Underscore and hyphen
If you want to make your URIs easy for humans to scan and interpret, use the underscore (_) or hyphen (-) character:
http://www.example.org/blog/this-is-my-first-post http://www.example.org/my_photos/our_summer_vacation/first_day/setting_up_camp/
There is no reason to favor one over the other. For the sake of consistency, pick one and use it consistently.
Ampersand
Use the ampersand character (&
) to separate
parameters in the query portion of the URI:
http://www.example.org/print?draftmode&landscape http://www.example.org/search?word=Antarctica&limit=30
In the first URI shown, the parameters are
draftmode
and landscape
. The second URI
has the parameters word=Antarctica
and
limit=30
.
Comma and semicolon
Use the comma (,
) and semi-colon (;
)
characters to indicate nonhierarchical portions of the URI. The
semicolon convention is used to identify matrix parameters:
http://www.example.org/co-ordinates;w=39.001409,z=-84.578201 http://www.example.org/axis;x=0,y=9
These characters are valid in the path and query portions of URIs, but not all code libraries recognize the comma and semicolon as separators and may require custom coding to extract these parameters.
Full stop, or period
Apart from its use in domain names, the full stop
(.
), or period, is used to separate the document and file
extension portions of the URI:
http://www.example.org/my-photos/flowers.png http://www.example.org/index.html http://www.example.org/api/recent-messages.xml http://www.example.org/blog/this.is.my.next.post.html
The last example in the previous list is valid but might introduce confusion. Since some code libraries use the period to signal the start of the file extension portion of the URI path, URIs with multiple periods can return unexpected results or might cause a parsing error.
Except for legacy reasons, there is no reason to use this character in URIs. Clients should use the media type of the representation to learn how to process the representation. “Sniffing” the media type from extensions can lead to security vulnerabilities. For instance, various versions of Internet Explorer are prone to security vulnerabilities because of its implementation of media type sniffing (http://msdn.microsoft.com/en-us/library/ms775148(VS.85).aspx).
Implementation-specific file extensions
Consider the following URIs:
http://www.example.org/report-summary.xml http://www.example.org/report-summary.jsp http://www.example.org/report-summary.aspx
In all three cases, the data is the same and the representation format may be the same, but the file extension indicates the technology used to generate the resource representation. These URIs will need to change if the technology used needs to change.
Spaces and capital letters
Spaces are valid URI characters, and according to RFC
3986, the space character should be percent-encoded to
%20
. However, the
application/x-www-form-urlencoded
media type (used by
HTML form
elements) encodes the space character as the
plus sign (+
). Consider the following HTML:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html lang="en"> <head> <title>Search</title> </head> <body> <form method="GET" action="http://www.example.org/search" enc-type="application/x-www-form-urlencoded"> <label for="phrase">Enter a search phrase</label> <input type="text" name="phrase" value=""/> <input type="submit" value="Search"/> </form> </body> </html>
When a user submits the search phrase “Hadron Supercollider,”
the resulting URI (using
application/x-www-form-urlencoded
rules) would be as
follows:
http://www.example.org/search?phrase=Hadron+Supercollider
Code that is not aware of how the URI was generated will interpret the URI using RFC 3986 and treat the value of the search phrase as “Hadron+Supercollider.”
This inconsistency can cause encoding errors for web services
that are not prepared to accept URIs encoded using the
application/x-www-form-urlencoded
media type. This is
not just a problem with common web browsers. Some code libraries
also apply these rules inconsistently.
Capital letters in URIs may also cause problems. RFC 3986 defines URIs
as case sensitive except for the scheme and host parts. For example,
although http://www.example.org/my-folder/doc.txt
and
HTTP://WWW.EXAMPLE.ORG/my-folder/doc.txt
are the same,
but http://www.example.org/My-Folder/doc.txt
isn’t.
However, Windows-based web servers treat these URIs as the same when
the resource is served from the filesystem. This case insensitivity
does not apply to characters in the query portion. For these
reasons, avoid using uppercase characters in URIs.
4.2. How to Use URIs As Opaque Identifiers
Treating URIs as opaque identifiers is, in most cases, trivial. It only requires you to make sure that each resource has a distinct URI. However, some practices illustrated in this recipe can lead to overloading URIs. In such cases, URIs may become generic gateways for unspecified information and actions. This can result in improperly cached responses, possibly even the leakage of secure data that should not be shared without appropriate authentication.
Problem
You want to know how to avoid situations that prevent URIs from being used as unique identifiers.
Solution
Use only the URI to determine which resource processes a request.
Do not tunnel repeated state changes over
POST
using the same URI or use custom headers to overload
URIs. Use custom headers for informational purposes only.
Discussion
Designating URIs as unique resource identifiers is a straightforward exercise except when you overload some HTTP methods or use something other than the URI to determine how to process a request.
Here is an example that uses a custom HTTP header to determine what to return:
# Request GET /news HTTP/1.1 Host: www.example.org X-Filter: science;sports;weather # Response HTTP/1.1 200 OK Content-Type: application/xml;charset=UTF-8 ... message body ...
In this example, the URI
http://www.example.org/news
is overloaded by the contents
of the X-Filter
header. If another client makes a similar
request but with a different value in this custom header (e.g.,
politics;economy;healthcare
), the server will return the
representation of a different resource.
Such practices are easy to avoid. In this example, the server should offer different URIs for different news filters.
Another common practice that uses URIs as gateways and not as
unique identifiers is tunneling repeated state changes using
POST
. This is the default practice in several web
frameworks including ASP.NET, JavaServer Pages, and some Ajax
toolkits:
# Request POST /ajax-endpoint HTTP/1.1 Host: www.example.org <request> <filter>science</filter> <filter>sports</filter> <filter>weather</filter> </request> # Response HTTP/1.1 200 OK Content-Type: application/xml;charset=UTF-8 ... message body ... # Request POST /ajax-endpoint HTTP/1.1 Host: www.example.org <request> <filter>politics</filter> <filter>economy</filter> <filter>healthcare</filter> </request> # Response HTTP/1.1 200 OK Content-Type: application/xml;charset=UTF-8 ... message body ...
Such practices are usually a result of treating HTTP as a transport protocol. As long as you avoid such practices, treating URIs as unique identifiers should be relatively easy.
4.3. How to Let Clients Treat URIs As Opaque Identifiers
No matter how you design your URIs, it is important that web services make it possible for clients to treat them as opaque identifiers to the extent possible. Clients should be able to use server-provided URIs to make additional requests without having to understand how the server’s URIs are structured.
Problem
You want to know how to ensure clients treat URIs as opaque.
Solution
Whenever possible, provide URIs at runtime using links in the body of representations (see Recipes 5.1 and 5.2) or headers (see Recipe 5.3).
When it is not reasonable to provide a complete set of possible URIs, consider using URI templates (see Recipe 5.7), or establish out-of-band rules to let clients construct URIs programmatically.
Discussion
Neither the architectural constraints of REST nor HTTP require that clients treat URIs as opaque. But doing so reduces coupling between servers and clients. A server expecting clients to construct URIs from bits of information returned in representations or offline knowledge (e.g., documentation or reverse-engineering) indicates tight coupling. This coupling can break existing clients when the web service makes changes to the way it creates new URIs.
In most cases, the process of creating URIs belongs to the server, not the client. For example, consider a photo-sharing web service, returning a list of photos uploaded recently to the server.
<?xml version="1.0" encoding="utf-8" ?> <photos> <photo> <id>nj1-1234</id> <user-id>987</user-id> <server-id>east-nj1</server-id> </photo> <photo> <id>nj4-1235</id> <user-id>988</user-id> <server-id>east-nj4</server-id> </photo> ... </photos>
Since no URIs are provided in this representation, anyone implementing a client for this web service must read documentation and write client code to programmatically create URIs to each photo.
http://east-nj1.photos.example.org/987/nj1-1234 http://east-nj4.photos.example.org/988/nj4-1235
These URIs contain implementation-level data such as server names, photo IDs, and user IDs. If the server makes architectural changes that result in changes in URIs for all new photos, clients will have to make changes in the way they create URIs.
Tip
When your web service requires clients to create URIs based on the implementation details of your web service, those details will become part of your web service’s public interface. Avoid or minimize leaking such implementation details to clients.
To decouple the client from these implementation details, the server can provide links in the representation.
<?xml version="1.0" encoding="utf-8" ?> <photos xmlns:atom="http://www.w3.org/2005/Atom"> <photo> <atom:link href="http://east-nj1.photos.example.org/987/nj1-1234" rel="alternate" title="Sunset view from our backyard"/> <atom:link href="http://east-nj1.photos.example.org/987" rel="http://www.example.org/rels/owner"/> <id>nj1-1234</id> </photo> <photo> <atom:link href="http://east-nj4.photos.example.org/988/nj4-1235" rel="alternate"/> <atom:link href="http://east-nj1.photos.example.org/988" rel="http://www.example.org/rels/owner"/> <id>nj4-1235</id> </photo> ... </photos>
This representation uses links to encode implementation details
into URIs directly. Each photo in this representation has a link with
a URI to fetch the image file and another link to fetch the owner
resource of each photo. To realize which link points to which, clients
do not have to know how to manufacture URIs. They just need to
understand the meaning of the values of the rel
attribute.
Warning
Note that requiring clients to treat URIs as opaque may require you to tradeoff against performance. Usually URIs are longer in length than database identifiers, and hence transporting URIs over the network increases the message size. This may matter when the representation needs to convey a large number of URIs.
In cases where it is impractical for web services to supply the client with a list of all the possible URIs in the representation (e.g., supporting ad hoc searching), use “semi-opaque” URI templates (see Recipe 5.7). You will also need to loosen/ignore opacity if you want to protect against request tampering by using digitally signed URIs (see Recipe 12.5) or to encrypt parts of the URI to shield sensitive information. For this purpose, clients and servers will need to exchange details of how to sign URIs out of band.
4.4. How to Keep URIs Cool
URIs should be designed to last a long time. Clients may store URIs in databases and configuration files, or may even hard-code them in code. In fact, the Web works under the assumption that URIs are permanent. This design principle is referred to with the axiom “Cool URIs don’t change” (http://www.w3.org/Provider/Style/URI). When a server decides to change its URIs, clients will fail to function. Cool URIs are those that never change.
The effect of URI changes may seem insignificant when your web service is operating in a private and controlled network. However, URIs make up a vital part of the interface between clients and servers, and changes to URIs are bound to be disruptive. This recipe shows you how to keep URIs permanent.
Problem
You want to know how to support the axiom “Cool URIs don’t change.”
Solution
Design URIs based on stable concepts, identifiers, and
information. Use rewrite rules on the server to shield clients from
implementation-level changes. In cases where URIs must change (e.g.,
when merging two applications, major redesign, etc.), honor old URIs
and issue redirects to clients with the new URI using 301
(Moved Permanently
) responses or, in rare cases, by
issuing a 410
(Gone
) for URIs that are no
longer valid.
URIs cannot be permanent if the concepts or identifiers used for URIs cannot be permanent for business, technical, or security reasons. See Recipe 5.6 for ways to deal with such cases.
Discussion
The permanence of URIs depends on stability and the permanence
of concepts and identifiers used to create URIs. For example, the URI
http://www.example.org/2009/11/my_trip_report
for a
document titled “My Trip Report” is stable as long as the
server treats the title as unchangeable once the document has been
published. Usually, unique identifiers used to store data of resources
help design stable URIs. Such identifiers rarely change.
Even when the concepts/identifiers used to create URIs change, it may be possible to hide such changes by employing rewrite rules supported by web servers such as Apache mod_rewrite (http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html) and Internet Information Services (IIS) server’s URLRewrite (http://www.iis.net/extensions/URLRewrite). You can use these web server extensions to hide URI changes that may be caused by merging server applications, changing paths, etc.
If you are not able hide URI changes, respond to all requests to
the old URI with a 301
(Moved Permanently
)
and the new URI in the Location
header:
# Request GET /users/1 HTTP/1.1 Host: www.example.org Accept: application/json # Response HTTP/1.1 301 Moved Permanently Location: http://www.example2.org/users/1
When a client receives the 301
(Moved
Permanently
) response, it should remove any copies of the old
URI from the client’s local storage and replace them with the new URI.
This will reduce the number of redirects the client needs to
follow.
Warning
Do not disable support for redirects in client applications. Instead, consider a
sensible limit on the number of redirects a client can follow.
Also verify that the Location
URI maps to a trusted
domain or IP address. Disabling redirects altogether will break the
client when the server decides to change URIs.
Once you set up redirection, monitor request traffic on the server for the old URIs. Maintain redirection services for old URIs until you are confident the majority of clients have updated their stored links to point to the new URI. When you cannot monitor the old URIs, establish and communicate an appropriate end-of-life policy for old URIs.
Once the traffic has fallen off or the preset time interval has
passed, convert the 301
(Moved Permanently
)
responses to 410
(Gone
) or 404
(Not Found
). Also include a message body to indicate
where the new (or related) resources may be found.
# Request GET /users/ HTTP/1.1 Host: www.example.org Accept: application/xml;charset=UTF-8 # Response HTTP/1.1 410 Gone Content-Type: application/xml;charset=UTF-8; Expires: Sat, 01 Jan 2011 00:00:00 GMT <error xmlns:atom="http://www.w3.org/2005/Atom"> <atom:link rel="help" href="http://www.example2.org"/> <message xml:lang="en-US">This resource no longer exists. Related information may be found at http://www.example2.org</message> </error>
Note that the previous example shows the 410
(Gone
) response is marked with an Expires
header value far into the future. For more on caching responses, see
Chapter 9.
Get RESTful Web Services Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.