Chapter 3. What Makes RESTful Services Different?
I pulled a kind of bait-and-switch on you earlier, and it’s time to make things right. Though this is a book about RESTful web services, most of the real services I’ve shown you are REST-RPC hybrids like the del.icio.us API: services that don’t quite work like the rest of the Web. This is because right now, there just aren’t many well-known RESTful services that work like the Web. In previous chapters I wanted to show you clients for real services you might have heard of, so I had to take what I could get.
The del.icio.us and Flickr APIs are good examples of hybrid services. They work like the Web when you’re fetching data, but they’re RPC-style services when it comes time to modify the data. The various Yahoo! search services are very RESTful, but they’re so simple that they don’t make good examples. The Amazon E-Commerce Service (seen in Example 1-2) is also quite simple, and defects to the RPC style on a few obscure but important points.
These services are all useful. I think the RPC style is the wrong one for web services, but that never prevents me from writing an RPC-style client if there’s interesting data on the other side. I can’t use Flickr or the del.icio.us API as examples of how to design RESTful web services, though. That’s why I covered them early in the book, when the only thing I was trying to show was what’s on the programmable web and how to write HTTP clients. Now that we’re approaching a heavy design chapter, I need to show you what a service looks like when it’s RESTful and resource-oriented.
Introducing the Simple Storage Service
Two popular web services can answer this call: the Atom Publishing Protocol (APP), and Amazon’s Simple Storage Service (S3). (Appendix A lists some publicly deployed RESTful web services, many of which you may not have heard of.) The APP is less an actual service than a set of instructions for building a service, so I’m going to start with S3, which actually exists at a specific place on the Web. In Chapter 9 I discuss the APP, Atom, and related topics like Google’s GData. For much of the rest of this chapter, I’ll explore S3.
S3 is a way of storing any data you like, structured however you like. You can keep your data private, or make it accessible by anyone with a web browser or BitTorrent client. Amazon hosts the storage and the bandwidth, and charges you by the gigabyte for both. To use the example S3 code in this chapter, you’ll need to sign up for the S3 service by going to http://aws.amazon.com/s3. The S3 technical documentation is at http://docs.amazonwebservices.com/AmazonS3/2006-03-01/.
There are two main uses for S3, as a:
- Backup server
You store your data through S3 and don’t give anyone else access to it. Rather than buying your own backup disks, you’re renting disk space from Amazon.
- Data host
You store your data on S3 and give others access to it. Amazon serves your data through HTTP or BitTorrent. Rather than paying an ISP for bandwidth, you’re paying Amazon. Depending on your existing bandwidth costs this can save you a lot of money. Many of today’s web startups use S3 to serve data files.
Unlike the services I’ve shown so far, S3 is not inspired by any existing web site. The del.icio.us API is based on the del.icio.us web site, and the Yahoo! search services are based on corresponding web sites, but there’s no web page on amazon.com where you fill out HTML forms to upload your files to S3. S3 is intended only for programmatic use. (Of course, if you use S3 as a data host, people will use it through their web browsers, without even knowing they’re making a web service call. It’ll act like a normal web site.)
Amazon provides sample libraries for Ruby, Python, Java, C#, and Perl (see http://developer.amazonwebservices.com/connect/kbcategory.jspa?categoryID=47). There are also third-party libraries, like Ruby’s AWS::S3, which includes the s3sh shell I demonstrated back in Example 1-4.
Object-Oriented Design of S3
S3 is based on two concepts: S3 “buckets” and S3 “objects.” An object is a named piece of data with some accompanying metadata. A bucket is a named container for objects. A bucket is analogous to the filesystem on your hard drive, and an object to one of the files on that filesystem. It’s tempting to compare a bucket to a directory on a filesystem, but filesystem directories can be nested and buckets can’t. If you want a directory structure inside your bucket, you need to simulate one by giving your objects names like “directory/subdirectory/file-object.”
A Few Words About Buckets
A bucket has one piece of information associated with it: the name. A bucket name can only contain the characters A through Z, a through z, 0 through 9, underscore, period, and dash. I recommend staying away from uppercase letters in bucket names.
As I mentioned above, buckets cannot contain other buckets: only objects. Each S3 user is limited to 100 buckets, and your bucket name cannot conflict with anyone else’s. I recommend you either keep everything in one bucket, or name each bucket after one of your projects or domain names.
A Few Words About Objects
An object has four parts to it:
A reference to the parent bucket.
A set of metadata key-value pairs associated with the object. This is mostly custom metadata, but it may also include values for the standard HTTP headers
Content-Type
andContent-Disposition
.
If I wanted to host the O’Reilly web site on S3, I’d create a
bucket called “oreilly.com,” and fill it with objects whose keys were
“” (the empty string), “catalog,” “catalog/9780596529260,” and so on.
These objects correspond to the URIs http://oreilly.com/, http://oreilly.com/catalog, and so on. The object’s
values would be the HTML contents of O’Reilly’s web pages. These S3
objects would have their Content-Type
metadata value set to text/html
, so that people browsing the site
would be served these objects as HTML documents, as opposed to XML or
plain text.
What If S3 Was a Standalone Library?
If S3 was implemented as an object-oriented code library instead
of a web service, you’d have two classes
S3Bucket
and S3Object
.
They’d have getter and setter methods for their data members:
S3Bucket#name
, S3Object.value=
, S3Bucket#addObject
, and the like. The
S3Bucket
class would have an
instance method S3Bucket#getObjects
that returned a list of
S3Object
instances, and a class method
S3Bucket.getBuckets
that returned
all of your buckets. Example 3-1 shows what the
Ruby code for this class might look like.
class S3Bucket # A class method to fetch all of your buckets. def self.getBuckets end # An instance method to fetch the objects in a bucket. def getObjects end ... end class S3Object # Fetch the data associated with this object. def data end # Set the data associated with this object. def data=(new_value) end ... end
Resources
Amazon exposes S3 as two different web services: a RESTful
service based on plain HTTP envelopes, and an RPC-style service based on
SOAP envelopes. The RPC-style service exposes functions much like the
methods in Example 3-1’s hypothetical Ruby library:
ListAllMyBuckets
, CreateBucket
, and so on. Indeed, many
RPC-style web services are automatically generated from their
implementation methods, and expose the same interfaces as the
programming-language code they call behind the scenes. This works
because most modern programming (including object-oriented programming)
is procedural.
The RESTful S3 service exposes all the functionality of the
RPC-style service, but instead of doing it with custom-named functions,
it exposes standard HTTP objects called resources. Instead of responding to
custom method names like getObjects
, a resource responds to one or
more of the six standard HTTP methods: GET, HEAD, POST, PUT, DELETE, and
OPTIONS.
The RESTful S3 service provides three types of resources. Here they are, with sample URIs for each:
The list of your buckets (
https://s3.amazonaws.com/
). There’s only one resource of this type.A particular bucket (
https://s3.amazonaws.com/
). There can be up to 100 resources of this type.{name-of-bucket}
/A particular S3 object inside a bucket (
https://s3.amazonaws.com/
). There can be infinitely many resources of this type.{name-of-bucket}
/{name-of-object}
Each method from my hypothetical object-oriented S3 library
corresponds to one of the six standard methods on one of these three
types of resources. The getter method S3Object#name
corresponds to a GET request on
an “S3 object” resource, and the setter method S3Object#value=
corresponds to a PUT request
on the same resource. Factory methods like S3Bucket.getBuckets
and relational methods
like S3Bucket#getObjects
correspond
to GET methods on the “bucket list” and “bucket” resources.
Every resource exposes the same interface and works the same way. To get an object’s value you send a GET request to that object’s URI. To get only the metadata for an object you send a HEAD request to the same URI. To create a bucket, you send a PUT request to a URI that incorporates the name of the bucket. To add an object to a bucket, you send PUT to a URI that incorporates the bucket name and object name. To delete a bucket or an object, you send a DELETE request to its URI.
The S3 designers didn’t just make this up. According to the HTTP standard this is what GET, HEAD, PUT, and DELETE are for. These four methods (plus POST and OPTIONS, which S3 doesn’t use) suffice to describe all interaction with resources on the Web. To expose your programs as web services, you don’t need to invent new vocabularies or smuggle method names into URIs, or do anything except think carefully about your resource design. Every REST web service, no matter how complex, supports the same basic operations. All the complexity lives in the resources.
Table 3-1 shows what happens when you send an HTTP request to the URI of an S3 resource.
GET | HEAD | PUT | DELETE | |
The bucket list (/ ) | List your buckets | - | - | - |
A bucket (/ ) | List the bucket’s objects | - | Create the bucket | Delete the bucket |
An object (/ | Get the object’s value and metadata | Get the object’s metadata | Set the object’s value and metadata | Delete the object |
That table looks kind of ridiculous. Why did I take up valuable space by printing it? Everything just does what it says. And that is why I printed it. In a well-designed RESTful service, everything does what it says.
You may well be skeptical of this claim, given the evidence so
far. S3 is a pretty generic service. If all you’re doing is sticking
data into named slots, then of course you can implement the service
using only generic verbs like GET and PUT. In Chapter 5 and Chapter 6 I’ll show you
strategies for mapping any kind of action to the
uniform interface. For a sample preconvincing, note that I was able to
get rid of S3Bucket.getBuckets
by
defining a new resource as “the list of buckets,” which responds only to
GET. Also note that S3Bucket#addObject
simply disappeared as a
natural consequence of the resource design, which requires that every
object be associated with some bucket.
Compare this to S3’s RPC-style SOAP interface. To get the bucket
list through SOAP, the method name is ListAllMyBuckets
. To get the contents of a
bucket, the method name is ListBucket
. With the RESTful interface, it’s
always GET. In a RESTful service, the URI designates an object (in the
object-oriented sense) and the method names are standardized. The same
few methods work the same way across resources and services.
HTTP Response Codes
Another defining feature of a RESTful architecture is its use of HTTP response codes. If you send a request to S3, and S3 handles it with no problem, you’ll probably get back an HTTP response code of 200 (“OK”), just like when you successfully fetch a web page in your browser. If something goes wrong, the response code will be in the 3xx, 4xx, or 5xx range: for instance, 500 (“Internal Server Error”). An error response code is a signal to the client that the metadata and entity-body should not be interpreted as a response to the request. It’s not what the client asked for: it’s the server’s attempt to tell the client about a problem. Since the response code isn’t part of the document or the metadata, the client can see whether or not an error occurred just by looking at the first three bytes of the response.
Example 3-2 shows a sample error response. I made
an HTTP request for an object that didn’t exist (https://s3.amazonaws.com/crummy.com/nonexistent/object
).
The response code is 404 (“Not Found”).
404 Not Found Content-Type: application/xml Date: Fri, 10 Nov 2006 20:04:45 GMT Server: AmazonS3 Transfer-Encoding: chunked X-amz-id-2: /sBIPQxHJCsyRXJwGWNzxuL5P+K96/Wvx4FhvVACbjRfNbhbDyBH5RC511sIz0w0 X-amz-request-id: ED2168503ABB7BF4 <?xml version="1.0" encoding="UTF-8"?> <Error> <Code>NoSuchKey</Code> <Message>The specified key does not exist.</Message> <Key>nonexistent/object</Key> <RequestId>ED2168503ABB7BF4</RequestId> <HostId>/sBIPQxHJCsyRXJwGWNzxuL5P+K96/Wvx4FhvVACbjRfNbhbDyBH5RC511sIz0w0</HostId> </Error>
HTTP response codes are underused on the human web. Your browser doesn’t show you the HTTP response code when you request a page, because who wants to look at a numeric code when you can just look at the document to see whether something went wrong? When an error occurs in a web application, most web applications send 200 (“OK”) along with a human-readable document that talks about the error. There’s very little chance a human will mistake the error document for the document they requested.
On the programmable web, it’s just the opposite. Computer programs are good at taking different paths based on the value of a numeric variable, and very bad at figuring out what a document “means.” In the absence of prearranged rules, there’s no way for a program to tell whether an XML document contains data or describes an error. HTTP response codes are the rules: rough conventions about how the client should approach an HTTP response. Because they’re not part of the entity-body or metadata, a client can understand what happened even if it has no clue how to read the response.
S3 uses a variety of response codes in addition to 200 (“OK”) and 404 (“Not Found”). The most common is probably 403 (“Forbidden”), used when the client makes a request without providing the right credentials. S3 also uses a few others, including 400 (“Bad Request”), which indicates that the server couldn’t understand the data the client sent; and 409 (“Conflict”), sent when the client tries to delete a bucket that’s not empty. For a full list, see the S3 technical documentation under “The REST Error Response.” I describe every HTTP response code in Appendix B, with a focus on their application to web services. There are 41 official HTTP response codes, but only about 10 are important in everyday use.
An S3 Client
The Amazon sample libraries, and the third-party contributions like AWS::S3, eliminate much of the need for custom S3 client libraries. But I’m not telling you about S3 just so you’ll know about a useful web service. I want to use it to illustrate the theory behind REST. So I’m going to write a Ruby S3 client of my own, and dissect it for you as I go along.
Just to show it can be done, my library will
implement an object-oriented interface, like the one from Example 3-1, on top of the S3 service. The result will look
like ActiveRecord or some other object-relational mapper. Instead of
making SQL calls under the covers to store data in a database, though,
it’ll make HTTP requests under the covers to store data on the S3
service. Rather than give my methods resource-specific names like
getBuckets
and getObjects
, I’ll try to use names that reflect
the underlying RESTful interface: get
, put
, and so on.
The first thing I need is an interface to Amazon’s rather unusual
web service authorization mechanism. But that’s not as interesting as
seeing the web service in action, so I’m going to skip it for now. I’m
going to create a very small Ruby module called
S3::Authorized
, just so my other
S3
classes can include it. I’ll come back to it
at the end, and fill in the details.
Example 3-3 shows a bit of throat-clearing code.
#!/usr/bin/ruby -w # S3lib.rb # Libraries necessary for making HTTP requests and parsing responses. require 'rubygems' require 'rest-open-uri' require 'rexml/document' # Libraries necessary for request signing require 'openssl' require 'digest/sha1' require 'base64' require 'uri' module S3 # This is the beginning of a big, all-encompassing module. module Authorized # Enter your public key (Amazon calls it an "Access Key ID") and # your private key (Amazon calls it a "Secret Access Key"). This is # so you can sign your S3 requests and Amazon will know who to # charge. @@public_key = '' @@private_key = '' if @@public_key.empty? or @@private_key.empty? raise "You need to set your S3 keys." end # You shouldn't need to change this unless you're using an S3 clone like # Park Place. HOST = 'https://s3.amazonaws.com/' end
The only interesting aspect of this bare-bones
S3::Authorized
is that it’s where you should plug
in the two cryptographic keys associated with your Amazon Web Services
account. Every S3 request you make includes your public key (Amazon calls it an “Access Key ID”) so that Amazon can
identify you. Every request you make must be cryptographically signed
with your private key (Amazon calls it a “Secret Access Key”) so that Amazon knows it’s really
you. I’m using the standard cryptographic terms, even though your
“private key” is not totally private—Amazon knows it too. It is private
in the sense that you should never reveal it to anyone else. If you do,
the person you reveal it to will be able to make S3 requests and have
Amazon charge you for it.
The Bucket List
Example 3-4 shows an object-oriented class
for my first resource, the list of buckets. I’ll call the class for
this resource S3::BucketList
.
# The bucket list. class BucketList include Authorized # Fetch all the buckets this user has defined. def get buckets = [] # GET the bucket list URI and read an XML document from it. doc = REXML::Document.new(open(HOST).read) # For every bucket... REXML::XPath.each(doc, "//Bucket/Name") do |e| # ...create a new Bucket object and add it to the list. buckets << Bucket.new(e.text) if e.text end return buckets end end
Now my file is a real web service client. If I call S3::BucketList#get
I make a secure HTTP GET
request to https://s3.amazonaws.com/, which
happens to be the URI of the resource “a list of your buckets.” The S3
service sends back an XML document that looks something like Example 3-5. This is a representation (as I’ll start
calling it in the next chapter) of the resource “a list of your
buckets.” It’s just some information about the current state of that
list. The Owner
tag makes it clear
whose bucket list it is (my AWS account name is evidently
“leonardr28”), and the Buckets
tag
contains a number of Bucket
tags
describing my buckets (in this case, there’s one Bucket
tag and one bucket).
<?xml version='1.0' encoding='UTF-8'?> <ListAllMyBucketsResult xmlns='http://s3.amazonaws.com/doc/2006-03-01/'> <Owner> <ID>c0363f7260f2f5fcf38d48039f4fb5cab21b060577817310be5170e7774aad70</ID> <DisplayName>leonardr28</DisplayName> </Owner> <Buckets> <Bucket> <Name>crummy.com</Name> <CreationDate>2006-10-26T18:46:45.000Z</CreationDate> </Bucket> </Buckets> </ListAllMyBucketsResult>
For purposes of this small client application, the Name
is the only aspect of a bucket I’m
interested in. The XPath expression //Bucket/Name
gives me the name of every
bucket, which is all I need to create Bucket
objects.
As we’ll see, one thing that’s missing from this XML document
is links. The document gives the name
of every bucket, but says nothing about where the buckets can be found
on the Web. In terms of the REST design criteria, this is the major
shortcoming of Amazon S3. Fortunately, it’s not too difficult to
program a client to calculate a URI from the bucket name. I just
follow the rule I gave earlier: https://s3.amazonaws.com/
.{name-of-bucket}
The Bucket
Now, let’s write the S3::Bucket
class, so
that S3::BucketList.get
will have
something to instantiate (Example 3-6).
# A bucket that you've stored (or will store) on the S3 application. class Bucket include Authorized attr_accessor :name def initialize(name) @name = name end # The URI to a bucket is the service root plus the bucket name. def uri HOST + URI.escape(name) end # Stores this bucket on S3. Analagous to ActiveRecord::Base#save, # which stores an object in the database. See below in the # book text for a discussion of acl_policy. def put(acl_policy=nil) # Set the HTTP method as an argument to open(). Also set the S3 # access policy for this bucket, if one was provided. args = {:method => :put} args["x-amz-acl"] = acl_policy if acl_policy # Send a PUT request to this bucket's URI. open(uri, args) return self end # Deletes this bucket. This will fail with HTTP status code 409 # ("Conflict") unless the bucket is empty. def delete # Send a DELETE request to this bucket's URI. open(uri, :method => :delete) end
Here are two more web service methods: S3::Bucket#put
and S3::Bucket#delete
. Since the URI to a
bucket uniquely identifies the bucket, deletion is simple: you send a
DELETE request to the bucket URI, and it’s gone. Since a bucket’s name
goes into its URI, and a bucket has no other settable properties, it’s
also easy to create a bucket: just send a PUT request to its URI. As I’ll show when I write
S3::Object
, a PUT request is more complicated
when not all the data can be stored in the URI.
Earlier I compared my S3::
classes to
ActiveRecord classes, but S3::Bucket#put
works
a little differently from an ActiveRecord implementation of save
. A row in an ActiveRecord-controlled
database table has a numeric unique ID. If you take an ActiveRecord
object with ID 23 and change its name, your change is reflected as a
change to the database record with ID 23:
SET name="newname" WHERE id=23
The permanent ID of an S3 bucket is its URI, and the URI
includes the name. If you change the name of a bucket and call
put
, the client doesn’t rename
the old bucket on S3: it creates a new, empty bucket at a new URI with
the new name. This is a result of design decisions made by the S3
programmers. It doesn’t have to be this way. The Ruby on Rails
framework has a different design: when it exposes database rows
through a RESTful web service, the URI to a row incorporates its
numeric database IDs. If S3 was a Rails service you’d see buckets at
URIs like /buckets/23
. Renaming the
bucket wouldn’t change the URI.
Now comes the last method of S3::Bucket
,
which I’ve called get
. Like
S3::BucketList.get
, this method
makes a GET request to the URI of a resource (in this
case, a “bucket” resource), fetches an XML document, and parses it
into new instances of a Ruby class (see Example 3-7). This method supports a variety of
ways to filter the contents of S3 buckets. For instance, you can use
:Prefix
to retrieve only objects whose keys start
with a certain string. I won’t cover these filtering options in
detail. If you’re interested in them, see the S3 technical
documentation on “Listing Keys.”
# Get the objects in this bucket: all of them, or some subset. # # If S3 decides not to return the whole bucket/subset, the second # return value will be set to true. To get the rest of the objects, # you'll need to manipulate the subset options (not covered in the # book text). # # The subset options are :Prefix, :Marker, :Delimiter, :MaxKeys. # For details, see the S3 docs on "Listing Keys". def get(options={}) # Get the base URI to this bucket, and append any subset options # onto the query string. uri = uri() suffix = '?' # For every option the user provided... options.each do |param, value| # ...if it's one of the S3 subset options... if [:Prefix, :Marker, :Delimiter, :MaxKeys].member? :param # ...add it to the URI. uri << suffix << param.to_s << '=' << URI.escape(value) suffix = '&' end end # Now we've built up our URI. Make a GET request to that URI and # read an XML document that lists objects in the bucket. doc = REXML::Document.new(open(uri).read) there_are_more = REXML::XPath.first(doc, "//IsTruncated").text == "true" # Build a list of S3::Object objects. objects = [] # For every object in the bucket... REXML::XPath.each(doc, "//Contents/Key") do |e| # ...build an S3::Object object and append it to the list. objects << Object.new(self, e.text) if e.text end return objects, there_are_more end end
Make a GET request of the application’s root URI, and you get a
representation of the resource “a list of your buckets.” Make a GET
request to the URI of a “bucket” resource, and you get a
representation of the bucket: an XML document like the one in Example 3-8, containing a Contents
tag for every element of the
bucket.
<?xml version='1.0' encoding='UTF-8'?> <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <Name>crummy.com</Name> <Prefix></Prefix> <Marker></Marker> <MaxKeys>1000</MaxKeys> <IsTruncated>false</IsTruncated> <Contents> <Key>mydocument</Key> <LastModified>2006-10-27T16:01:19.000Z</LastModified> <ETag>"93bede57fd3818f93eedce0def329cc7"</ETag> <Size>22</Size> <Owner> <ID> c0363f7260f2f5fcf38d48039f4fb5cab21b060577817310be5170e7774aad70</ID> <DisplayName>leonardr28</DisplayName> </Owner> <StorageClass>STANDARD</StorageClass> </Contents> </ListBucketResult>
In this case, the portion of the document I find interesting is the list of a bucket’s objects. An object is identified by its key, and I use the XPath expression “//Contents/Key” to fetch that information. I’m also interested in a certain Boolean variable (“//IsTruncated”): whether this document contains keys for every object in the bucket, or whether S3 decided there were too many to send in one document and truncated the list.
Again, the main thing missing from this representation is links.
The document lists lots of information about the objects, but not
their URIs. The client is expected to know how to turn an object name
into that object’s URI. Fortunately, it’s not too hard to build an
object’s URI, using the rule I already gave: https://s3.amazonaws.com/
.{name-of-bucket}
/{name-of-object}
The S3 Object
Now we’re ready to implement an interface to the core of
the S3 service: the object. Remember that an S3 object is just a data
string that’s been given a name (a key) and a set of metadata
key-value pairs (such as Content-Type
="text/html"
). When you send a GET request to the bucket list, or to a
bucket, S3 serves an XML document that you have to parse. When you
send a GET request to an object, S3 serves whatever data string you
PUT there earlier—byte for byte.
Example 3-9 shows the beginning of
S3::Object
, which should be nothing new by
now.
# An S3 object, associated with a bucket, containing a value and metadata. class Object include Authorized # The client can see which Bucket this Object is in. attr_reader :bucket # The client can read and write the name of this Object. attr_accessor :name # The client can write this Object's metadata and value. # I'll define the corresponding "read" methods later. attr_writer :metadata, :value def initialize(bucket, name, value=nil, metadata=nil) @bucket, @name, @value, @metadata = bucket, name, value, metadata end # The URI to an Object is the URI to its Bucket, and then its name. def uri @bucket.uri + '/' + URI.escape(name) end
What comes next is my first implementation of an HTTP HEAD request. I use it to fetch an object’s
metadata key-value pairs and populate the
metadata
hash with it (the actual
implementation of store_metadata
comes at the end of this class). Since I’m using rest-open-uri
, the code to make the HEAD
request looks the same as the code to make any other HTTP request (see
Example 3-10).
# Retrieves the metadata hash for this Object, possibly fetching # it from S3. def metadata # If there's no metadata yet... unless @metadata # Make a HEAD request to this Object's URI, and read the metadata # from the HTTP headers in the response. begin store_metadata(open(uri, :method => :head).meta) rescue OpenURI::HTTPError => e if e.io.status == ["404", "Not Found"] # If the Object doesn't exist, there's no metadata and this is not # an error. @metadata = {} else # Otherwise, this is an error. raise e end end end return @metadata end
The goal here is to fetch an object’s metadata without fetching the object itself. This is the difference between downloading a movie review and downloading the movie, and when you’re paying for the bandwidth it’s a big difference. This distinction between metadata and representation is not unique to S3, and the solution is general to all resource-oriented web services. The HEAD method gives any client a way of fetching the metadata for any resource, without also fetching its (possibly enormous) representation.
Of course, sometimes you do want to download the movie, and for
that you need a GET request. I’ve put the GET request in the accessor
method S3::Object#value
, in Example 3-11. Its structure mirrors that of S3::Object#metadata
.
# Retrieves the value of this Object, possibly fetching it # (along with the metadata) from S3. def value # If there's no value yet... unless @value # Make a GET request to this Object's URI. response = open(uri) # Read the metadata from the HTTP headers in the response. store_metadata(response.meta) unless @metadata # Read the value from the entity-body @value = response.read end return @value end
The client stores objects on the S3 service the same way it
stores buckets: by sending a PUT request to a certain URI. The bucket PUT is trivial
because a bucket has no distinguishing features other than its name,
which goes into the URI of the PUT request. An object PUT is more
complex. This is where the HTTP client specifies an object’s metadata
(such as Content-Type
) and value.
This information will be made available on future HEAD and GET
requests.
Fortunately, setting up the PUT request is not terribly
complicated, because an object’s value is whatever the client says it
is. I don’t have to wrap the object’s value in an XML document or
anything. I just send the data as is, and set HTTP headers that
correspond to the items of metadata in my metadata
hash (see Example 3-12).
# Store this Object on S3. def put(acl_policy=nil) # Start from a copy of the original metadata, or an empty hash if # there is no metadata yet. args = @metadata ? @metadata.clone : {} # Set the HTTP method, the entity-body, and some additional HTTP # headers. args[:method] = :put args["x-amz-acl"] = acl_policy if acl_policy if @value args["Content-Length"] = @value.size.to_s args[:body] = @value end # Make a PUT request to this Object's URI. open(uri, args) return self end
The S3::Object#delete
implementation (see Example 3-13) is identical to
S3::Bucket#delete
.
# Deletes this Object. def delete # Make a DELETE request to this Object's URI. open(uri, :method => :delete) end
And Example 3-14 shows the method for turning
HTTP response headers into S3 object metadata. Except for Content-Type
, you should prefix all the
metadata headers you set with the string “x-amz-meta-”. Otherwise they
won’t make the round trip to the S3 server and back to a web service
client. S3 will think they’re quirks of your client software and
discard them.
private # Given a hash of headers from a HTTP response, picks out the # headers that are relevant to an S3 Object, and stores them in the # instance variable @metadata. def store_metadata(new_metadata) @metadata = {} new_metadata.each do |h,v| if RELEVANT_HEADERS.member?(h) || h.index('x-amz-meta') == 0 @metadata[h] = v end end end RELEVANT_HEADERS = ['content-type', 'content-disposition', 'content-range', 'x-amz-missing-meta'] end
Request Signing and Access Control
I’ve put it off as long as I can, and now it’s time to deal with S3 authentication. If your main interest is in RESTful services in general, feel free to skip ahead to the section on using the S3 library in clients. But if the inner workings of S3 have piqued your interest, read on.
The code I’ve shown you so far makes HTTP requests all right, but
S3 rejects them, because they don’t contain the all-important Authorization
header. S3 has no proof that you’re the owner of your own
buckets. Remember, Amazon charges you for the data stored on their
servers and the bandwidth used in transferring that data. If S3 accepted
requests to your buckets with no authorization, anyone could store data
in your buckets and you’d get charged for it.
Most web services that require authentication use a standard HTTP mechanism to make sure you are who you claim to be. But S3’s needs are more complicated. With most web services you never want anyone else using your data. But one of the uses of S3 is as a hosting service. You might want to host a big movie file on S3, let anyone download it with their BitTorrent client, and have Amazon send you the bill.
Or you might be selling access to movie files stored on S3. Your e-commerce site takes payment from a customer and gives them an S3 URI they can use to download the movie. You’re delegating to someone else the right to make a particular web service call (a GET request) as you, and have it charged to your account.
The standard mechanisms for HTTP authentication can’t provide security for that kind of application. Normally, the person who’s sending the HTTP request needs to know the actual password. You can prevent someone from spying on your password, but you can’t say to someone else: “here’s my password, but you must promise only to use it to request this one URI.”
S3 solves this problem using a message authentication code (MAC). Every time you make an S3 request, you use your secret key (remember, the secret is shared between you and Amazon) to sign the important parts of the request. That’d be the URI, the HTTP method you’re using, and a few of the HTTP headers. Only someone who knows the secret can create these signatures for your requests, which is how Amazon knows it’s okay to charge you for the request. But once you’ve signed a request, you can send the signature to a third party without revealing the secret. The third party is then free to send an identical HTTP request to the one you signed, and have Amazon charge you for it. In short: someone else can make a specific request as you, for a limited time, without having to know your secret.
There is a simpler way to give anonymous access to your S3
objects, and I discuss it below. But there’s no way around signing your
own requests, so even a simple library like this one must support
request signing if it’s going to work. I’m reopening the
S3::Authorized
Ruby module now. I’m going to give
it the ability to intercept calls to the open
method, and sign HTTP requests before
they’re made. Since S3::BucketList
,
S3::Bucket
, and S3::Object
have all include
d this module,
they’ll inherit this ability as soon as I define it. Without the code
I’m about to write, all those open
calls I defined in the classes above will send unsigned HTTP requests
that just bounce off S3 with response code 403 (“Forbidden”). With this
code, you’ll be able to generate signed HTTP requests that pass through
S3’s security measures (and cost you money). The code in Example 3-15 and the other examples that follow is heavily
based on Amazon’s own example S3 library.
module Authorized # These are the standard HTTP headers that S3 considers interesting # for purposes of request signing. INTERESTING_HEADERS = ['content-type', 'content-md5', 'date'] # This is the prefix for custom metadata headers. All such headers # are considered interesting for purposes of request signing. AMAZON_HEADER_PREFIX = 'x-amz-' # An S3-specific wrapper for rest-open-uri's implementation of # open(). This implementation sets some HTTP headers before making # the request. Most important of these is the Authorization header, # which contains the information Amazon will use to decide who to # charge for this request. def open(uri, headers_and_options={}, *args, &block) headers_and_options = headers_and_options.dup headers_and_options['Date'] ||= Time.now.httpdate headers_and_options['Content-Type'] ||= '' signed = signature(uri, headers_and_options[:method] || :get, headers_and_options) headers_and_options['Authorization'] = "AWS #{@@public_key}:#{signed}" Kernel::open(uri, headers_and_options, *args, &block) end
The tough work here is in the signature
method, not yet defined. This
method needs to construct an encrypted string to go into a request’s
Authorization
header: a string that
convinces the S3 service that it’s really you sending the request—or
that you’ve authorized someone else to make the request at your expense
(see Example 3-16).
# Builds the cryptographic signature for an HTTP request. This is # the signature (signed with your secret key) of a "canonical # string" containing all interesting information about the request. def signature(uri, method=:get, headers={}, expires=nil) # Accept the URI either as a string, or as a Ruby URI object. if uri.respond_to? :path path = uri.path else uri = URI.parse(uri) path = uri.path + (uri.query ? "?" + query : "") end # Build the canonical string, then sign it. signed_string = sign(canonical_string(method, path, headers, expires)) end
Well, this method passes the buck again, by calling sign
on the result of canonical_string
. Let’s look at those two
methods, starting with canonical_string
. It turns an HTTP request
into a string that looks something like Example 3-17. That string contains everything
interesting (from S3’s point of view) about an HTTP request, in a
specific format. The interesting data is the HTTP method (PUT), the
Content-type
(“text/plain”), a date,
a few other HTTP headers (“x-amz-metadata”), and the path portion of the
URI (“/crummy.com/myobject”). This is the string that sign
will sign. Anyone can create this
string, but only the S3 account holder and Amazon know how to produce
the correct signature.
PUT text/plain Fri, 27 Oct 2006 21:22:41 GMT x-amz-metadata:Here's some metadata for the myobject object. /crummy.com/myobject
When Amazon’s server receives your HTTP request, it generates the canonical string, signs it (again, Amazon knows your secret key), and sees whether the two signatures match. That’s how S3 authentication works. If the signatures match, your request goes through. Otherwise, you get a response code of 403 (“Forbidden”).
Example 3-18 shows the code to generate the canonical string.
# Turns the elements of an HTTP request into a string that can be # signed to prove a request comes from your web service account. def canonical_string(method, path, headers, expires=nil) # Start out with default values for all the interesting headers. sign_headers = {} INTERESTING_HEADERS.each { |header| sign_headers[header] = '' } # Copy in any actual values, including values for custom S3 # headers. headers.each do |header, value| if header.respond_to? :to_str header = header.downcase # If it's a custom header, or one Amazon thinks is interesting... if INTERESTING_HEADERS.member?(header) || header.index(AMAZON_HEADER_PREFIX) == 0 # Add it to the header hash. sign_headers[header] = value.to_s.strip end end end # This library eliminates the need for the x-amz-date header that # Amazon defines, but someone might set it anyway. If they do, # we'll do without HTTP's standard Date header. sign_headers['date'] = '' if sign_headers.has_key? 'x-amz-date' # If an expiration time was provided, it overrides any Date # header. This signature will be valid until the expiration time, # not only during the single second designated by the Date header. sign_headers['date'] = expires.to_s if expires # Now we start building the canonical string for this request. We # start with the HTTP method. canonical = method.to_s.upcase + "\n" # Sort the headers by name, and append them (or just their values) # to the string to be signed. sign_headers.sort_by { |h| h[0] }.each do |header, value| canonical << header << ":" if header.index(AMAZON_HEADER_PREFIX) == 0 canonical << value << "\n" end # The final part of the string to be signed is the URI path. We # strip off the query string, and (if necessary) tack one of the # special S3 query parameters back on: 'acl', 'torrent', or # 'logging'. canonical << path.gsub(/\?.*$/, '') for param in ['acl', 'torrent', 'logging'] if path =~ Regexp.new("[&?]#{param}($|&|=)") canonical << "?" << param break end end return canonical end
The implementation of sign
is
just a bit of plumbing around Ruby’s standard cryptographic and encoding
interfaces (see Example 3-19).
# Signs a string with the client's secret access key, and encodes the # resulting binary string into plain ASCII with base64. def sign(str) digest_generator = OpenSSL::Digest::Digest.new('sha1') digest = OpenSSL::HMAC.digest(digest_generator, @@private_key, str) return Base64.encode64(digest).strip end
Signing a URI
My S3 library has one feature still to be implemented. I’ve
mentioned a few times that S3 lets you sign an HTTP request and give
the URI to someone else, letting them make that request as you. Here’s
the method that lets you do this: signed_uri
(see Example 3-20). Instead of making an HTTP request with
open
, you pass the open
arguments into this method, and it
gives you a signed URI that anyone can use as you. To limit abuse, a
signed URI works only for a limited time. You can customize that time
by passing a Time
object in as the keyword
argument :expires
.
# Given information about an HTTP request, returns a URI you can # give to anyone else, to let them them make that particular HTTP # request as you. The URI will be valid for 15 minutes, or until the # Time passed in as the :expires option. def signed_uri(headers_and_options={}) expires = headers_and_options[:expires] || (Time.now.to_i + (15 * 60)) expires = expires.to_i if expires.respond_to? :to_i headers_and_options.delete(:expires) signature = URI.escape(signature(uri, headers_and_options[:method], headers_and_options, nil)) q = (uri.index("?")) ? "&" : "?" "#{uri}#{q}Signature=#{signature}&Expires=#{expires}&AWSAccessKeyId=#{@@public_key}" end end end # Remember the all-encompassing S3 module? This is the end.
Here’s how it works. Suppose I want to give a customer access to my hosted file at https://s3.amazonaws.com/BobProductions/KomodoDragon.avi. I can run the code in Example 3-21 to generate a URI for my customer.
#!/usr/bin/ruby1.9 # s3-signed-uri.rb require 'S3lib' bucket = S3::Bucket.new("BobProductions") object = S3::Object.new(bucket, "KomodoDragon.avi") puts object.signed_uri # "https://s3.amazonaws.com/BobProductions/KomodoDragon.avi # ?Signature=J%2Fu6kxT3j0zHaFXjsLbowgpzExQ%3D # &Expires=1162156499&AWSAccessKeyId=0F9DBXKB5274JKTJ8DG2"
That URI will be valid for 15 minutes, the default for my
signed_uri
implementation. It
incorporates my key ID (AWSAccessKeyId
), the
expiration time (Expires
), and
the cryptographic Signature
. My customer can
visit this URI and download the movie file KomodoDragon.avi. Amazon will charge me for
my customer’s use of their bandwidth. If my customer modifies any part
of the URI (maybe they to try to download a second movie too), the S3
service will reject their request. An untrustworthy customer can send
the URI to all of their friends, but it will stop working in 15
minutes.
You may have noticed a problem here. The canonical string
usually includes the value of the Date
header. When my customer visits the URI
you signed, their web browser will surely send a different value for
the Date
header. That’s why, when
you’re generating a canonical string to give to someone else, you set
an expiration date instead of a
request date. Look back to Example 3-18 and the implementation of
canonical_string
, where the
expiration date (if provided) overwrites any value for the Date
header.
Setting Access Policy
What if I want to make an object publicly accessible? I want to
serve my files to the world and let Amazon deal with the headaches of
server management. Well, I could set an expiration date very far in
the future, and give out the enormous signed URI to everyone. But
there’s an easier way to get the same results: allow anonymous access.
You can do this by setting the access policy for a
bucket or object, telling S3 to respond to unsigned requests
for it. You do this by sending the x-amz-acl
header
along with the PUT request that creates the bucket or object.
That’s what the acl_policy
argument to Bucket#put
and
Object#put
does. If you want to
make a bucket or object publicly readable or writable, you pass an
appropriate value in for acl_policy
. My client sends that value as
part of the custom HTTP request header X-amz-acl
. Amazon S3 reads this request
header and sets the rules for bucket or object access
appropriately.
The client in Example 3-22 creates an S3
object that anyone can read by visiting its URI at https://s3.amazonaws.com/BobProductions/KomodoDragon-Trailer.avi
.
In this scenario, I’m not selling my movies: just using Amazon as a
hosting service so I don’t have to serve movies from my own web
site.
#!/usr/bin/ruby -w # s3-public-object.rb require 'S3lib' bucket = S3::Bucket.new("BobProductions") object = S3::Object.new(bucket, "KomodoDragon-Trailer.avi") object.put("public-read")
S3 understands four access policies:
There are also fine-grained ways of granting access to a bucket
or object, which I won’t cover. If you’re interested, see the section
“Setting Access Policy with REST” in the S3 technical documentation.
That section reveals a parallel universe of extra resources. Every
bucket /{name-of-bucket}
has a
shadow resource /{name-of-bucket}?acl
corresponding to that
bucket’s access control rules, and every object /{name-of-bucket}/{name-of-object}
has a
shadow ACL resource /{name-of-bucket}/{name-of-object}?acl
. By
sending PUT requests to these URIs, and including XML representations
of access control lists in the request entity-bodies, you can set
specific permissions and limit access to particular S3 users.
Using the S3 Client Library
I’ve now shown you a Ruby client library that can access just about the full capabilities of Amazon’s S3 service. Of course, a library is useless without clients that use it. In the previous section I showed you a couple of small clients to demonstrate points about security, but now I’d like to show something a little more substantial.
Example 3-23 is a simple command-line S3 client that can create a bucket and an object, then list the contents of the bucket. This client should give you a high-level picture of how S3’s resources work together. I’ve annotated the lines of code that trigger HTTP requests, by describing the HTTP requests in comments off to the right.
#!/usr/bin/ruby -w # s3-sample-client.rb require 'S3lib' # Gather command-line arguments bucket_name, object_name, object_value = ARGV unless bucket_name puts "Usage: #{$0} [bucket name] [object name] [object value]" exit end # Find or create the bucket. buckets = S3::BucketList.new.get # GET / bucket = buckets.detect { |b| b.name == bucket_name } if bucket puts "Found bucket #{bucket_name}." else puts "Could not find bucket #{bucket_name}, creating it." bucket = S3::Bucket.new(bucket_name) bucket.put # PUT /{bucket} end # Create the object. object = S3::Object.new(bucket, object_name) object.metadata['content-type'] = 'text/plain' object.value = object_value object.put # PUT /{bucket}/{object} # For each object in the bucket... bucket.get[0].each do |o| # GET /{bucket} # ...print out information about the object. puts "Name: #{o.name}" puts "Value: #{o.value}" # GET /{bucket}/{object} puts "Metadata hash: #{o.metadata.inspect}" puts end
Clients Made Transparent with ActiveResource
Since all RESTful web services expose basically the same simple interface, it’s not a big chore to write a custom client for every web service. It is a little wasteful, though, and there are two alternatives. You can describe a service with a WADL file (introduced in the previous chapter, and covered in more detail in Chapter 9), and then access it with a generic WADL client. There’s also a Ruby library called ActiveResource that makes it trivial to write clients for certain kinds of web services.
ActiveResource is designed to run against web services that expose the rows and tables of a relational database. WADL can describe almost any kind of web service, but ActiveResource only works as a client for web services that follow certain conventions. Right now, Ruby on Rails is the only framework that follows the conventions. But any web service can answer requests from an ActiveResource client: it just has to expose its database through the same RESTful interface as Rails.
As of the time of writing, there are few publicly available web services that can be used with an ActiveResource client (I list a couple in Appendix A). To show you an example I’m going create a small Rails web service of my own. I’ll be able to drive my service with an ActiveResource client, without writing any HTTP client or XML parsing code.
Creating a Simple Service
My web service will be a simple notebook: a way of keeping timestamped notes to myself. I’ve got Rails 1.2 installed on my computer, so I can create the notebook service like this:
$rails notebook
$cd notebook
I create a database on my system called notebook_development
, and edit the Rails
file notebook/config/database.yml
to give Rails the information it needs to connect to my database. Any
general guide to Rails will have more detail on these initial
steps.
Now I’ve created a Rails application, but it doesn’t do
anything. I’m going to generate code for a simple, RESTful web service
with the scaffold
generator. I want
my notes to contain a timestamp and a body of text, so I run the
following command:
$ ruby script/generate scaffold note date:date body:text
create app/views/notes
create app/views/notes/index.rhtml
create app/views/notes/show.rhtml
create app/views/notes/new.rhtml
create app/views/notes/edit.rhtml
create app/views/layouts/notes.rhtml
create public/stylesheets/scaffold.css
create app/models/note.rb
create app/controllers/notes_controller.rb
create test/functional/notes_controller_test.rb
create app/helpers/notes_helper.rb
create test/unit/note_test.rb
create test/fixtures/notes.yml
create db/migrate
create db/migrate/001_create_notes.rb
route map.resources :notes
Rails has generated a complete set of web service code—model,
view, and controller—for my “note” object. There’s code in db/migrate/001_create_notes.rb
that creates
a database table called notes
with
three fields: a unique ID, a date (date
), and a piece of text (body
).
The model code in app/models/note.rb
provides an
ActiveResource interface to the database table. The controller code in
app/controllers/notes_controller.rb
exposes that interface to the world through HTTP, and the views in
app/views/notes define the user
interface. It adds up to a RESTful web service—not a very fancy one,
but one that’s good enough for a demo or to use as a starting
point.
Before starting the service I need to initialize the database:
$ rake db:migrate
== CreateNotes: migrating =====================================================
-- create_table(:notes)
-> 0.0119s
== CreateNotes: migrated (0.0142s) ============================================
Now I can start the notebook application and start using my service:
$ script/server
=> Booting WEBrick...
=> Rails application started on http://0.0.0.0:3000
=> Ctrl-C to shutdown server; call with --help for options
An ActiveResource Client
The application I just generated is not much use except as a demo, but it demos some pretty impressive features. First, it’s both a web service and a web application. I can visit http://localhost:3000/notes in my web browser and create notes through the web interface. After a while the view of http://localhost:3000/notes might look like Figure 3-1.
If you’ve ever written a Rails application or seen a Rails demo, this should look familiar. But in Rails 1.2, the generated model and controller can also act as a RESTful web service. A programmed client can access it as easily as a web browser can.
Unfortunately, the ActiveResource client itself was not released along with Rails 1.2. As of the time of writing, it’s still being developed on the tip of the Rails development tree. To get the code I need to check it out from the Subversion version control repository:
$svn co http://dev.rubyonrails.org/svn/rails/trunk activeresource_client
$cd activeresource_client
Now I’m ready to write ActiveResource clients for the notebook’s web service. Example 3-24 is a client that creates a note, modifies it, lists the existing notes, and then deletes the note it just created.
#!/usr/bin/ruby -w # activeresource-notebook-manipulation.rb require 'activesupport/lib/active_support' require 'activeresource/lib/active_resource' # Define a model for the objects exposed by the site class Note < ActiveResource::Base self.site = 'http://localhost:3000/' end def show_notes notes = Note.find :all # GET /notes.xml puts "I see #{notes.size} note(s):" notes.each do |note| puts " #{note.date}: #{note.body}" end end new_note = Note.new(:date => Time.now, :body => "A test note") new_note.save # POST /notes.xml new_note.body = "This note has been modified." new_note.save # PUT /notes/{id}.xml show_notes new_note.destroy # DELETE /notes/{id}.xml puts show_notes
Example 3-25 shows the output when I run that program:
I see 3 note(s): 2006-06-05: What if I wrote a book about REST? 2006-12-18: Pasta for lunch maybe? 2006-12-18: This note has been modified. I see 2 note(s): 2006-06-05: What if I wrote a book about REST? 2006-12-18: Pasta for lunch maybe?
If you’re familiar with ActiveRecord, the object-relational mapper that connects Rails to a database, you’ll notice that the ActiveResource interface looks almost exactly the same. Both libraries provide an object-oriented interface to a wide variety of objects, each of which exposes a uniform interface. With ActiveRecord, the objects live in a database and are exposed through SQL, with its SELECT, INSERT, UPDATE, and DELETE. With ActiveResource, they live in a Rails application and are exposed through HTTP, with its GET, POST, PUT, and DELETE.
Example 3-26 is an excerpt from the Rails server logs at the time I ran my ActiveResource client. The GET, POST, PUT, and DELETE requests correspond to the commented lines of code back in Example 3-24.
"POST /notes.xml HTTP/1.1" 201 "PUT /notes/5.xml HTTP/1.1" 200 "GET /notes.xml HTTP/1.1" 200 "DELETE /notes/5.xml HTTP/1.1" 200 "GET /notes.xml HTTP/1.1" 200
What’s going on in these requests? The same thing that’s going on in requests to S3: resource access through HTTP’s uniform interface. My notebook service exposes two kinds of resources:
The list of notes (
/notes.xml
). Compare to an S3 bucket, which is a list of objects.A note (
/notes/{id}.xml
). Compare to an S3 object.
These resources expose GET, PUT, and DELETE, just like the S3 resources do. The list of notes also supports POST to create a new note. That’s a little different from S3, where objects are created with PUT, but it’s just as RESTful.
When the client runs, XML documents are transferred invisibly between client and server. They look like the documents in Example 3-27 or 3-28: simple depictions of the underlying database rows.
<?xml version="1.0" encoding="UTF-8"?> <notes> <note> <body>What if I wrote a book about REST?</body> <date type="date">2006-06-05</date> <id type="integer">2</id> </note> <note> <body>Pasta for lunch maybe?</body> <date type="date">2006-12-18</date> <id type="integer">3</id> </note> </notes>
A Python Client for the Simple Service
Right now the only ActiveResource client library is the Ruby library, and Rails is the only framework that exposes ActiveResource-compatible services. But nothing’s happening here except HTTP requests that pass XML documents into certain URIs and get XML documents back. There’s no reason why a client in some other language couldn’t send those XML documents, or why some other framework couldn’t expose the same URIs.
Example 3-29 is a Python implementation of the client program from Example 3-24. It’s longer than the Ruby program, because it can’t rely on ActiveResource. It has to build its own XML documents and make its own HTTP requests, but its structure is almost exactly the same.
#!/usr/bin/python # activeresource-notebook-manipulation.py from elementtree.ElementTree import Element, SubElement, tostring from elementtree import ElementTree import httplib2 import time BASE = "http://localhost:3000/" client = httplib2.Http(".cache") def showNotes(): headers, xml = client.request(BASE + "notes.xml") doc = ElementTree.fromstring(xml) for note in doc.findall('note'): print "%s: %s" % (note.find('date').text, note.find('body').text) newNote = Element("note") date = SubElement(newNote, "date") date.attrib['type'] = "date" date.text = time.strftime("%Y-%m-%d", time.localtime()) body = SubElement(newNote, "body") body.text = "A test note" headers, ignore = client.request(BASE + "notes.xml", "POST", body= tostring(newNote), headers={'content-type' : 'application/xml'}) newURI = headers['location'] modifiedBody = Element("note") body = SubElement(modifiedBody, "body") body.text = "This note has been modified" client.request(newURI, "PUT", body=tostring(modifiedBody), headers={'content-type' : 'application/xml'}) showNotes() client.request(newURI, "DELETE") print showNotes()
Parting Words
Because RESTful web services have simple and well-defined interfaces, it’s not difficult to clone them or swap out one implementation for another. Park Place is a Ruby application that exposes the same HTTP interface as S3. You can use Park Place to host your own version of S3. S3 libraries and client programs will work against your Park Place server just as they now do against https://s3.amazonaws.com/.
It’s also possible to clone ActiveResource. No one has done this yet, but it shouldn’t be difficult to write a general ActiveResource client for Python or any other dynamic language. In the meantime, writing a one-off client for an ActiveResource-compatible service is no more difficult than writing a client for any other RESTful service.
By now you should feel comfortable with the prospect of writing a client for any RESTful or REST-RPC hybrid service, whether it serves XML, HTML, JSON, or some mixture. It’s all just HTTP requests and document parsing.
You should also be getting a feel for what differentiates RESTful web services like S3 and Yahoo!’s search services from RPC-style and hybrid services like the Flickr and del.icio.us APIs. This is not a judgement about the service’s content, only about its architecture. In woodworking it’s important to work with the grain of the wood. The Web, too, has a grain, and a RESTful web service is one that works with it.
In the coming chapters I’ll show how you can create web services that are more like S3 and less like the del.icio.us API. This culminates in Chapter 7, which reinvents del.icio.us as a RESTful web service.
Get RESTful Web Services now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.