Chapter 4. Metadata Design
HTTP Headers
Various forms of metadata may be conveyed through the entity headers contained within HTTP’s request and response messages. HTTP defines a set of standard headers, some of which provide information about a requested resource. Other headers indicate something about the representation carried by the message. Finally, a few headers serve as directives to control intermediary caches.
This brief chapter suggests a set of rules to help REST API designers work with HTTP’s standard headers.
Rule: Content-Type must be used
The Content-Type
header names
the type of data found within a request or response
message’s body. The value of this header is a specially formatted text
string known as a media type, which is the subject
of Media Types. Clients and servers rely on
this header’s value to tell them how to process the sequence of bytes in
a message’s body.
Rule: Content-Length should be used
The Content-Length
header gives
the size of the entity-body in bytes. In responses, this header is
important for two reasons. First, a client can know whether it has read
the correct number of bytes from the connection. Second, a client can
make a HEAD
request to find out how
large the entity-body is, without downloading it.
Rule: Last-Modified should be used in responses
The Last-Modified
header
applies to response messages only. The value of this response header is
a timestamp that indicates the last time that something happened to
alter the representational state of the resource. Clients and cache
intermediaries may rely on this header to determine the freshness of
their local copies of a resource’s state representation. This header
should always be supplied in response to GET
requests.
Rule: ETag should be used in responses
The value of ETag
is an opaque
string that identifies a specific “version” of the representational
state contained in the response’s entity. The
entity is the HTTP message’s payload, which is composed of a message’s
headers and body. The entity tag may be any string value, so long as it
changes along with the resource’s representation. This header should
always be sent in response to GET
requests.
Clients may choose to save an ETag
header’s value for use in future GET
requests, as the value of the conditional
If-None-Match
request header. If the
REST API concludes that the entity tag hasn’t changed, then it can save
time and bandwidth by not sending the representation again.
Warning
Generating an ETag
from a
machine-specific value is a bad idea.
Specifically don’t generate ETag
values from an inconsistent source, like a host-specific notion of a
file’s last modified time. It may result in different ETag
values being attributed to the same
representation, which is likely to confuse the API’s clients and
intermediaries.
Rule: Stores must support conditional PUT requests
A store resource uses the PUT
method for both insert and update, which means it is difficult for a
REST API to know the true intent of a client’s
PUT
request. Through headers, HTTP
provides the necessary support to help an API resolve any potential
ambiguity. A REST API must rely on the client to include the If-Unmodified-Since
and/or If-Match
request headers to express their
intent. The If-Unmodified-Since
request header asks the API to proceed with the operation if, and only
if, the resource’s state representation hasn’t changed since the time
indicated by the header’s supplied timestamp value. The If-Match
header’s value is an entity tag,
which the client remembers from an earlier response’s ETag
header value. The If-Match
header makes the request conditional,
based upon an exact match of the header’s supplied entity tag value and
the representational state’s current entity tag value, as stored or
computed by the REST API.
The following example illustrates how a REST API can support
conditional PUT
requests using these two headers.
Two client programs, client#1 and client#2, use a REST API’s
/objects store resource to share some information
between them. Client#1 sends a PUT
request in order to store some new data that it identifies with a URI
path of /objects/2113. This is a new URI that the
REST API has never seen before, meaning that it does not map to any
previously stored resource. Therefore, the REST API interprets the
request as an insert and creates a new resource
based on the client’s provided state representation and then it returns
a 201
(“Created”) response.
Some time later, client#2 decides to share some data and it
requests the exact same storage URI
(/objects/2113). Now the REST API
is able to map this URI to an existing resource,
which makes it unclear about the client request’s intent. The REST API
has not been given enough information to decide whether or not it should
overwrite client#1’s stored
resource state with the new data from client#2. In this scenario, the
API is forced to return a 409
(“Conflict”) response to client#2’s request. The API should also provide
some additional information about the error in the response’s
body.
If client#2 decides to update the stored data, it may retry its
request to include the If-Match
header. However, if the supplied header value does not match the
current entity tag value, the REST API must return
error code 412
(“Precondition
Failed”). If the supplied condition does match, the REST API must update
the stored resource’s state, and return a 200
(“OK”) or 204
(“No Content”) response. If the response
does include an updated representation of the resource’s state, the API
must include values for the Last-Modified
and ETag
headers that reflect the update.
Note
HTTP supports conditional requests with the GET
, POST
, and DELETE
methods in the same fashion that is
illustrated by the example above. This pattern is the key that allows
writable REST APIs to support collaboration
between their clients.
Rule: Location must be used to specify the URI of a newly created resource
The Location
response header’s
value is a URI that identifies a resource that may be of interest to the
client. In response to the successful creation of a resource within a
collection or store, a REST API must include the Location
header to designate the URI of the
newly created resource.
In a 202
(“Accepted”) response,
this header may be used to direct clients to the operational status of
an asynchronous controller resource.
Rule: Cache-Control, Expires, and Date response headers should be used to encourage caching
Caching is one of the most useful features built on top of HTTP. You can take advantage of caching to reduce client-perceived latency, to increase reliability, and to reduce the load on an API’s servers. Caches can be anywhere. They can be in the API’s server network, content delivery networks (CDNs), or the client’s network.
When serving a representation, include a Cache-Control
header with a max-age value (in
seconds) equal to the freshness lifetime. For example:
Cache-Control: max-age=60, must-revalidate
To support legacy HTTP 1.0 caches, a REST API should include an
Expires
header with the expiration
date-time. The value is a time at which the API generated the
representation plus the freshness lifetime. REST APIs should also
include a Date
header with a
date-time of the time at which the API returned the response. Including
this header helps clients compute the freshness lifetime as the
difference between the values of the Expires
and Date
headers. For example:
Date: Tue, 15 Nov 1994 08:12:31 GMT Expires: Thu, 01 Dec 1994 16:00:00 GMT
Rule: Cache-Control, Expires, and Pragma response headers may be used to discourage caching
If a REST API’s response must not cached, add Cache-Control
headers with the value no-cache
and no-store
. In this case, also add the Pragma: no-cache
and Expires: 0
header values to interoperate with
legacy HTTP 1.0 caches.
Rule: Caching should be encouraged
The no-cache
directive will
prevent any cache from serving cached responses. REST APIs should not do
this unless absolutely necessary. Using a small value of max-age
as opposed to adding no-cache
directive helps clients fetch cached
copies for at least a short while without significantly impacting
freshness.
Rule: Expiration caching headers should be used with 200 (“OK”) responses
Set expiration caching headers in responses to successful GET
and HEAD
requests. Although POST
is cacheable, most caches treat this
method as non-cacheable. You need not set expiration headers on other
methods.
Rule: Expiration caching headers may optionally be used with 3xx and 4xx responses
In addition to successful responses with the 200
(“OK”) response code, consider adding
caching headers to 3xx
and 4xx
responses. Known as negative
caching, this helps reduce the amount of redirecting and
error-triggering load on a REST API.
Rule: Custom HTTP headers must not be used to change the behavior of HTTP methods
You can optionally use custom headers for informational purposes only. Implement clients and servers such that they do not fail when they do not find expected custom headers.
If the information you are conveying through a custom HTTP header is important for the correct interpretation of the request or response, include that information in the body of the request or response or the URI used for the request. Avoid custom headers for such usages.
Media Types
To identify the form of the data contained within a request or
response message body, the Content-Type
header’s value references a media type.[25]
Media Type Syntax
Media types have the following syntax:
type "/" subtype *( ";" parameter )
The type value may be one of: application
, audio
, image
, message
, model
, multipart
, text
, or video
. A typical REST API will most often work
with media types that fall under the application
type. In a hierarchical fashion,
the media type’s subtype value is subordinate to
its type.
Note that parameters may follow the
type/subtype in the form of attribute=value
pairs that are separated by a
leading semi-colon (;) character. A media type’s specification may
designate parameters as either required or optional. Parameter names are
case-insensitive. Parameter values are normally case-sensitive and may
be enclosed in double quote (“ ”) characters. When more than one
parameter is specified, their ordering is insignificant.
The two examples below demonstrate a Content-Type
header value that references a
media type with a single charset parameter:
Content-type: text/html; charset=ISO-8859-4 Content-type: text/plain; charset="us-ascii"
Registered Media Types
The Internet Assigned Numbers Authority[26] (IANA) governs the set of registered media types and provides links to each type’s published specification (RFC). The IANA allows anyone to propose a new media type by filling out the “Application for Media Type” form found at http://www.iana.org/cgi-bin/mediatypes.pl.
Some commonly used registered media types are listed below:
- text/plain
A plain text format with no specific content structure or markup.[27]
- text/html
Content that is formatted using the HyperText Markup Language (HTML).[28]
- image/jpeg
An image compression method that was standardized by the Joint Photographic Experts Group (JPEG).[29]
- application/xml
Content that is structured using the Extensible Markup Language (XML).[30]
- application/atom+xml
Content that uses the Atom Syndication Format (Atom), which is an XML-based format that structures data into lists known as feeds.[31]
- application/javascript
Source code written in the JavaScript programming language.[32]
- application/json
The JavaScript Object Notation (JSON) text-based format that is often used by programs to exchange structured data.[33]
Vendor-Specific Media Types
Media types use the subtype prefix “vnd” to indicate that they are owned or controlled by a “vendor.” Vendor-specific media types convey a clear description of a message’s content to the programs that understand their meaning. Unlike their more common counterparts, vendor-specific media types impart application-specific metadata that makes a message more meaningful to the web component that receives it.
Vendor-specific media types may also be registered with the IANA. For example, the following vendor-specific types are among the many listed in the IANA’s registry (http://www.iana.org/assignments/media-types):
application/vnd.ms-excel application/vnd.lotus-notes text/vnd.sun.j2me.app-descriptor
Media Type Design
Client developers are encouraged to rely on the self-descriptive features of a REST API. In other words, client programs should hardcode as few API-specific details as possible. This goal influences many aspects of a REST API’s design, including opaque URIs, hypermedia-based actions with resource state awareness, and descriptive media types.
Rule: Application-specific media types should be used
REST APIs treat the body of an HTTP request or response as part of an application-specific interaction. While the body may be formatted using languages such as JSON or XML, it usually has semantics that require special processing beyond simply parsing the language’s syntax.
As an example, consider a REST API URI such as
http://api.soccer.restapi.org/players/2113 that
responds to GET
requests with a
representation of a player resource that is formatted using JSON. If the
Content-Type
header field value
declares that the response’s media type is
application/json, it has accurately conveyed the
body content’s syntax but has disregarded the semantics and structure of
the player representation. The response’s Content-Type
header simply tells a client that
it should expect some JSON-formatted text.
Alternatively, the response’s Content-Type
header field should communicate
that the body contains a representation of a player document that is
formatted with JSON. To help achieve this goal, the WRML framework,
which was introduced in the section WRML, uses a
descriptive media type: application/wrml. The
example below shows WRML’s media type used to describe a player form
that is formatted using JSON:
# NOTE: the line breaks below are for the sake of visual clarity. application/wrml; format="http://api.formats.wrml.org/application/json"; schema="http://api.schemas.wrml.org/soccer/Player"
The WRML media type.[34]
The required format parameter’s value identifies a document resource that describes the JSON format itself.
The required schema parameter’s value identifies a separate document that details the
Player
resource type’s form, which is independent of the media type’s format parameter’s value.
This media type may appear excessive when compared to simpler ones like application/json. However, this is a worthwhile trade-off since this media type communicates—directly to clients—distinct and complementary bits of information regarding the content of a message. The application/wrml media type’s self-descriptive and pluggable design reduces the need for information to be communicated out-of-band and then hardcoded by client developers.
Note
See Media Type Representation, which describes how this media type’s format and schema documents should be represented.
Media Type Format Design
Most media types identify a format using a simple string, like
application/json. Instead, by using a
format
parameter with a URI value, the WRML media
type directs client programs to a cacheable
document that provides links to other documents related to the format.
In the example above, the representation of the document referenced by
the format
parameter
(http://api.formats.wrml.org/application/json)
contains links to related web resources, such as
http://www.json.org and
http://www.rfc-editor.org/rfc/rfc4627.txt.
More importantly, by leveraging REST’s code-on-demand constraint, the format document’s representation can provide links to formatting and parsing code, which clients can download and execute to serialize and deserialize an HTTP message body’s content. By providing this code, available for various programming languages and runtime environments, an API can programmatically teach its clients how to interoperate with its representation formats. The future-proof nature of this design may prove especially useful when a REST API wishes to adopt a new format that is not yet widely supported by its clients.
The section Rule: A consistent form should be used to represent media type formats, outlines the structure of a format document’s representation.
Media Type Schema Design
As discussed next in Chapter 5, a resource’s state representation consists of fields and links. For a given “class” of resource, the set of expected fields and context-sensitive links can be described by a schema document. The WRML media type’s schema parameter references a cacheable schema document, which describes a resource type’s fields and links; independent of any specific representational format. This separation of concerns allows multiple representation formats to be negotiated by clients and supported by REST APIs with relative ease. With a set of standard primitive types, outlined in Field Representation, a schema document can describe a resource representation’s fields in a format-independent manner.
The section Rule: A consistent form should be used to represent media type schemas, details the structure of a schema document’s representation.
Media Type Schema Versioning
The different versions of a given schema
should be organized as different schema documents, with distinct URIs.
This design is borrowed from the approach traditionally used by the
W3C[35] and IETF[36] for versioning the URIs of
Internet Drafts on their way to becoming approved
standards. The example below shows the URI of a schema document that
details the fields and links of a soccer Player
resource type:
http://api.schemas.wrml.org/soccer/Player-2
The -2
suffix designates the
version number of the Player
resource type’s schema. As a rule, the current version of the resource
type’s schema should always be made available through a separate
resource identifier, without a numeric suffix. The example below
demonstrates the design of the Player
resource type’s current schema
URI:
http://api.schemas.wrml.org/soccer/Player
The URI of a resource type’s current schema version always identifies the concept of the most recent version. A schema document URI that ends with a number permanently identifies a specific version of the schema. Therefore the latest version of a schema is always modeled by two separate resources which conceptually overlap while the numbered version is also the current one. This overlap results in the two distinct resources, with two separate URIs, consistently having the same state representation.
Rule: Media type negotiation should be supported when multiple representations are available
Allow clients to negotiate for a given format and schema by
submitting an Accept
header with the
desired media type. For example:
# NOTE: the line breaks below are for the sake of visual clarity. Accept: application/wrml; format="http://api.formats.wrml.org/text/html"; schema="http://api.schemas.wrml.org/soccer/Team"
Additionally, to facilitate browser-based viewing and debugging of a REST API’s responses, consider supporting raw media types as shown in the example below:
Accept: application/json
This will allow web browser add-ons such as JSONView to render a REST API’s responses as JSON.
Rule: Media type selection using a query parameter may be supported
To enable simple links and easy debugging, REST APIs may support
media type selection via a query parameter named
accept with a value format that mirrors that of the
Accept
HTTP request header. For
example:
GET /bookmarks/mikemassedotcom?accept=application/xml
This is a more precise and generic approach to media type identification that should be preferred over the common alternative of appending a virtual file extension like .xml to the URI’s path. The virtual file extension approach binds the resource and its representation together, implying that they are one and the same.
Warning
Media type selection (or negotiation) via a query parameter is a
form of tunneling that conveys metadata in the
URI rather than in HTTP’s intended slot: the Accept
header. Therefore it should be used
with careful consideration.
Recap
This chapter covered the design rules for a REST API’s metadata conveyed through HTTP headers and media types. Table 4-1 summarizes the vocabulary terms that were used in this chapter.
Term | Description |
Atom Syndication Format (Atom) | An XML-based format that structures data into lists known as “feeds.” |
Conditional request | A client-initiated interaction with a precondition that the server is expected to honor. |
Entity | An HTTP request or response payload, which is metadata in header fields and content in a body. |
Entity tag | An opaque string value that designates the “version” of a given HTTP response message’s headers and body. |
Extensible Markup Language (XML) | A standardized application profile of SGML that is used by many applications to exchange data. |
Internet Assigned Numbers Authority (IANA) | The entity with many governance-related duties, which include overseeing global IP address allocation and media type registration. |
Media type negotiation | A client-initiated process that selects the form of a response message’s representation. |
Media type schema | A Web-oriented description of a form that is composed of fields and links. |
Negative caching | Directing intermediaries to serve copies of responses
that did not result in a |
Vendor-specific media type | A form descriptor that is owned and controlled by a specific organization. |
Table 4-2 recaps a REST API’s use of the HTTP headers.
Code | Purpose |
| Identifies the entity body’s media type |
| The size (in bytes) of the entity body |
| The date-time of last resource representation’s change |
| Indicates the version of the response message’s entity |
| A TTL-based caching value (in seconds) |
| Provides the URI of a resource |
[25] Media types were originally known as “MIME types,” which stood for Multipurpose Internet Mail Extensions.
[27] text/plain
[29] image/jpeg
[34] The application/wrml media type’s IANA registration is pending, see http://www.wrml.org for the most up-to-date information.
[35] World Wide Web Consortium (W3C), http://www.w3.org.
[36] The Internet Engineering Task Force (IETF), http://www.ietf.org.
Get REST API Design Rulebook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.