RESTful Web Services

Chapter 3. What Makes RESTful Services Different?

I pulled a kind of bait-and-switch on you earlier, and it’s time to make things right. Though this is a book about RESTful web services, most of the real services I’ve shown you are REST-RPC hybrids like the del.icio.us API: services that don’t quite work like the rest of the Web. This is because right now, there just aren’t many well-known RESTful services that work like the Web. In previous chapters I wanted to show you clients for real services you might have heard of, so I had to take what I could get.

The del.icio.us and Flickr APIs are good examples of hybrid services. They work like the Web when you’re fetching data, but they’re RPC-style services when it comes time to modify the data. The various Yahoo! search services are very RESTful, but they’re so simple that they don’t make good examples. The Amazon E-Commerce Service (seen in Example 1-2) is also quite simple, and defects to the RPC style on a few obscure but important points.

These services are all useful. I think the RPC style is the wrong one for web services, but that never prevents me from writing an RPC-style client if there’s interesting data on the other side. I can’t use Flickr or the del.icio.us API as examples of how to design RESTful web services, though. That’s why I covered them early in the book, when the only thing I was trying to show was what’s on the programmable web and how to write HTTP clients. Now that we’re approaching a heavy design chapter, I need to show you what a service looks like when it’s RESTful and resource-oriented.

Introducing the Simple Storage Service

Two popular web services can answer this call: the Atom Publishing Protocol (APP), and Amazon’s Simple Storage Service (S3). (Appendix A lists some publicly deployed RESTful web services, many of which you may not have heard of.) The APP is less an actual service than a set of instructions for building a service, so I’m going to start with S3, which actually exists at a specific place on the Web. In Chapter 9 I discuss the APP, Atom, and related topics like Google’s GData. For much of the rest of this chapter, I’ll explore S3.

S3 is a way of storing any data you like, structured however you like. You can keep your data private, or make it accessible by anyone with a web browser or BitTorrent client. Amazon hosts the storage and the bandwidth, and charges you by the gigabyte for both. To use the example S3 code in this chapter, you’ll need to sign up for the S3 service by going to http://aws.amazon.com/s3. The S3 technical documentation is at http://docs.amazonwebservices.com/AmazonS3/2006-03-01/.

There are two main uses for S3, as a:

Backup server: You store your data through S3 and don’t give anyone else access to it. Rather than buying your own backup disks, you’re renting disk space from Amazon.

Data host: You store your data on S3 and give others access to it. Amazon serves your data through HTTP or BitTorrent. Rather than paying an ISP for bandwidth, you’re paying Amazon. Depending on your existing bandwidth costs this can save you a lot of money. Many of today’s web startups use S3 to serve data files.

Unlike the services I’ve shown so far, S3 is not inspired by any existing web site. The del.icio.us API is based on the del.icio.us web site, and the Yahoo! search services are based on corresponding web sites, but there’s no web page on amazon.com where you fill out HTML forms to upload your files to S3. S3 is intended only for programmatic use. (Of course, if you use S3 as a data host, people will use it through their web browsers, without even knowing they’re making a web service call. It’ll act like a normal web site.)

Amazon provides sample libraries for Ruby, Python, Java, C#, and Perl (see http://developer.amazonwebservices.com/connect/kbcategory.jspa?categoryID=47). There are also third-party libraries, like Ruby’s AWS::S3, which includes the s3sh shell I demonstrated back in Example 1-4.

Object-Oriented Design of S3

S3 is based on two concepts: S3 “buckets” and S3 “objects.” An object is a named piece of data with some accompanying metadata. A bucket is a named container for objects. A bucket is analogous to the filesystem on your hard drive, and an object to one of the files on that filesystem. It’s tempting to compare a bucket to a directory on a filesystem, but filesystem directories can be nested and buckets can’t. If you want a directory structure inside your bucket, you need to simulate one by giving your objects names like “directory/subdirectory/file-object.”

A Few Words About Buckets

A bucket has one piece of information associated with it: the name. A bucket name can only contain the characters A through Z, a through z, 0 through 9, underscore, period, and dash. I recommend staying away from uppercase letters in bucket names.

As I mentioned above, buckets cannot contain other buckets: only objects. Each S3 user is limited to 100 buckets, and your bucket name cannot conflict with anyone else’s. I recommend you either keep everything in one bucket, or name each bucket after one of your projects or domain names.

A Few Words About Objects

An object has four parts to it:

A reference to the parent bucket.
The data stored in that object (S3 calls this the “value”).
A name (S3 calls it the “key”).
A set of metadata key-value pairs associated with the object. This is mostly custom metadata, but it may also include values for the standard HTTP headers Content-Type and Content-Disposition.

If I wanted to host the O’Reilly web site on S3, I’d create a bucket called “oreilly.com,” and fill it with objects whose keys were “” (the empty string), “catalog,” “catalog/9780596529260,” and so on. These objects correspond to the URIs http://oreilly.com/, http://oreilly.com/catalog, and so on. The object’s values would be the HTML contents of O’Reilly’s web pages. These S3 objects would have their Content-Type metadata value set to text/html, so that people browsing the site would be served these objects as HTML documents, as opposed to XML or plain text.

What If S3 Was a Standalone Library?

If S3 was implemented as an object-oriented code library instead of a web service, you’d have two classes S3Bucket and S3Object. They’d have getter and setter methods for their data members: S3Bucket#name, S3Object.value=, S3Bucket#addObject, and the like. The S3Bucket class would have an instance method S3Bucket#getObjects that returned a list of S3Object instances, and a class method S3Bucket.getBuckets that returned all of your buckets. Example 3-1 shows what the Ruby code for this class might look like.

Example 3-1. S3 implemented as a hypothetical Ruby library

class S3Bucket
  # A class method to fetch all of your buckets.
  def self.getBuckets
  end

  # An instance method to fetch the objects in a bucket.
  def getObjects
  end
  ...
end

class S3Object
  # Fetch the data associated with this object.
  def data
  end

  # Set the data associated with this object.
  def data=(new_value)
  end
  ...  
end

Resources

Amazon exposes S3 as two different web services: a RESTful service based on plain HTTP envelopes, and an RPC-style service based on SOAP envelopes. The RPC-style service exposes functions much like the methods in Example 3-1’s hypothetical Ruby library: ListAllMyBuckets, CreateBucket, and so on. Indeed, many RPC-style web services are automatically generated from their implementation methods, and expose the same interfaces as the programming-language code they call behind the scenes. This works because most modern programming (including object-oriented programming) is procedural.

The RESTful S3 service exposes all the functionality of the RPC-style service, but instead of doing it with custom-named functions, it exposes standard HTTP objects called resources. Instead of responding to custom method names like getObjects, a resource responds to one or more of the six standard HTTP methods: GET, HEAD, POST, PUT, DELETE, and OPTIONS.

The RESTful S3 service provides three types of resources. Here they are, with sample URIs for each:

The list of your buckets (https://s3.amazonaws.com/). There’s only one resource of this type.
A particular bucket (https://s3.amazonaws.com/{name-of-bucket}/). There can be up to 100 resources of this type.
A particular S3 object inside a bucket (https://s3.amazonaws.com/{name-of-bucket}/{name-of-object}). There can be infinitely many resources of this type.

Each method from my hypothetical object-oriented S3 library corresponds to one of the six standard methods on one of these three types of resources. The getter method S3Object#name corresponds to a GET request on an “S3 object” resource, and the setter method S3Object#value= corresponds to a PUT request on the same resource. Factory methods like S3Bucket.getBuckets and relational methods like S3Bucket#getObjects correspond to GET methods on the “bucket list” and “bucket” resources.

Every resource exposes the same interface and works the same way. To get an object’s value you send a GET request to that object’s URI. To get only the metadata for an object you send a HEAD request to the same URI. To create a bucket, you send a PUT request to a URI that incorporates the name of the bucket. To add an object to a bucket, you send PUT to a URI that incorporates the bucket name and object name. To delete a bucket or an object, you send a DELETE request to its URI.

The S3 designers didn’t just make this up. According to the HTTP standard this is what GET, HEAD, PUT, and DELETE are for. These four methods (plus POST and OPTIONS, which S3 doesn’t use) suffice to describe all interaction with resources on the Web. To expose your programs as web services, you don’t need to invent new vocabularies or smuggle method names into URIs, or do anything except think carefully about your resource design. Every REST web service, no matter how complex, supports the same basic operations. All the complexity lives in the resources.

Table 3-1 shows what happens when you send an HTTP request to the URI of an S3 resource.

Table 3-1. S3 resources and their methods

	GET	HEAD	PUT	DELETE
The bucket list (`/`)	List your buckets	-	-	-
A bucket (`/{bucket}`)	List the bucket’s objects	-	Create the bucket	Delete the bucket
An object (`/{bucket}/{object})`	Get the object’s value and metadata	Get the object’s metadata	Set the object’s value and metadata	Delete the object

That table looks kind of ridiculous. Why did I take up valuable space by printing it? Everything just does what it says. And that is why I printed it. In a well-designed RESTful service, everything does what it says.

You may well be skeptical of this claim, given the evidence so far. S3 is a pretty generic service. If all you’re doing is sticking data into named slots, then of course you can implement the service using only generic verbs like GET and PUT. In Chapter 5 and Chapter 6 I’ll show you strategies for mapping any kind of action to the uniform interface. For a sample preconvincing, note that I was able to get rid of S3Bucket.getBuckets by defining a new resource as “the list of buckets,” which responds only to GET. Also note that S3Bucket#addObject simply disappeared as a natural consequence of the resource design, which requires that every object be associated with some bucket.

Compare this to S3’s RPC-style SOAP interface. To get the bucket list through SOAP, the method name is ListAllMyBuckets. To get the contents of a bucket, the method name is ListBucket. With the RESTful interface, it’s always GET. In a RESTful service, the URI designates an object (in the object-oriented sense) and the method names are standardized. The same few methods work the same way across resources and services.

HTTP Response Codes

Another defining feature of a RESTful architecture is its use of HTTP response codes. If you send a request to S3, and S3 handles it with no problem, you’ll probably get back an HTTP response code of 200 (“OK”), just like when you successfully fetch a web page in your browser. If something goes wrong, the response code will be in the 3xx, 4xx, or 5xx range: for instance, 500 (“Internal Server Error”). An error response code is a signal to the client that the metadata and entity-body should not be interpreted as a response to the request. It’s not what the client asked for: it’s the server’s attempt to tell the client about a problem. Since the response code isn’t part of the document or the metadata, the client can see whether or not an error occurred just by looking at the first three bytes of the response.

Example 3-2 shows a sample error response. I made an HTTP request for an object that didn’t exist (https://s3.amazonaws.com/crummy.com/nonexistent/object). The response code is 404 (“Not Found”).

Example 3-2. A sample error response from S3

404 Not Found
Content-Type: application/xml
Date: Fri, 10 Nov 2006 20:04:45 GMT
Server: AmazonS3
Transfer-Encoding: chunked
X-amz-id-2: /sBIPQxHJCsyRXJwGWNzxuL5P+K96/Wvx4FhvVACbjRfNbhbDyBH5RC511sIz0w0
X-amz-request-id: ED2168503ABB7BF4

<?xml version="1.0" encoding="UTF-8"?>
<Error>
 <Code>NoSuchKey</Code>
 <Message>The specified key does not exist.</Message>
 <Key>nonexistent/object</Key>
 <RequestId>ED2168503ABB7BF4</RequestId>
 <HostId>/sBIPQxHJCsyRXJwGWNzxuL5P+K96/Wvx4FhvVACbjRfNbhbDyBH5RC511sIz0w0</HostId>
</Error>

HTTP response codes are underused on the human web. Your browser doesn’t show you the HTTP response code when you request a page, because who wants to look at a numeric code when you can just look at the document to see whether something went wrong? When an error occurs in a web application, most web applications send 200 (“OK”) along with a human-readable document that talks about the error. There’s very little chance a human will mistake the error document for the document they requested.

On the programmable web, it’s just the opposite. Computer programs are good at taking different paths based on the value of a numeric variable, and very bad at figuring out what a document “means.” In the absence of prearranged rules, there’s no way for a program to tell whether an XML document contains data or describes an error. HTTP response codes are the rules: rough conventions about how the client should approach an HTTP response. Because they’re not part of the entity-body or metadata, a client can understand what happened even if it has no clue how to read the response.

S3 uses a variety of response codes in addition to 200 (“OK”) and 404 (“Not Found”). The most common is probably 403 (“Forbidden”), used when the client makes a request without providing the right credentials. S3 also uses a few others, including 400 (“Bad Request”), which indicates that the server couldn’t understand the data the client sent; and 409 (“Conflict”), sent when the client tries to delete a bucket that’s not empty. For a full list, see the S3 technical documentation under “The REST Error Response.” I describe every HTTP response code in Appendix B, with a focus on their application to web services. There are 41 official HTTP response codes, but only about 10 are important in everyday use.

An S3 Client

The Amazon sample libraries, and the third-party contributions like AWS::S3, eliminate much of the need for custom S3 client libraries. But I’m not telling you about S3 just so you’ll know about a useful web service. I want to use it to illustrate the theory behind REST. So I’m going to write a Ruby S3 client of my own, and dissect it for you as I go along.

Just to show it can be done, my library will implement an object-oriented interface, like the one from Example 3-1, on top of the S3 service. The result will look like ActiveRecord or some other object-relational mapper. Instead of making SQL calls under the covers to store data in a database, though, it’ll make HTTP requests under the covers to store data on the S3 service. Rather than give my methods resource-specific names like getBuckets and getObjects, I’ll try to use names that reflect the underlying RESTful interface: get, put, and so on.

The first thing I need is an interface to Amazon’s rather unusual web service authorization mechanism. But that’s not as interesting as seeing the web service in action, so I’m going to skip it for now. I’m going to create a very small Ruby module called S3::Authorized, just so my other S3 classes can include it. I’ll come back to it at the end, and fill in the details.

Example 3-3 shows a bit of throat-clearing code.

Example 3-3. S3 Ruby client: Initial code

#!/usr/bin/ruby -w
# S3lib.rb

# Libraries necessary for making HTTP requests and parsing responses.
require 'rubygems'
require 'rest-open-uri'
require 'rexml/document'

# Libraries necessary for request signing
require 'openssl'
require 'digest/sha1'
require 'base64'
require 'uri'

module S3 # This is the beginning of a big, all-encompassing module.

module Authorized
  # Enter your public key (Amazon calls it an "Access Key ID") and
  # your private key (Amazon calls it a "Secret Access Key"). This is
  # so you can sign your S3 requests and Amazon will know who to
  # charge.
  @@public_key = ''
  @@private_key = ''

  if @@public_key.empty? or @@private_key.empty?	
    raise "You need to set your S3 keys."
  end

  # You shouldn't need to change this unless you're using an S3 clone like
  # Park Place.
  HOST = 'https://s3.amazonaws.com/'
end

The only interesting aspect of this bare-bones S3::Authorized is that it’s where you should plug in the two cryptographic keys associated with your Amazon Web Services account. Every S3 request you make includes your public key (Amazon calls it an “Access Key ID”) so that Amazon can identify you. Every request you make must be cryptographically signed with your private key (Amazon calls it a “Secret Access Key”) so that Amazon knows it’s really you. I’m using the standard cryptographic terms, even though your “private key” is not totally private—Amazon knows it too. It is private in the sense that you should never reveal it to anyone else. If you do, the person you reveal it to will be able to make S3 requests and have Amazon charge you for it.

The Bucket List

Example 3-4 shows an object-oriented class for my first resource, the list of buckets. I’ll call the class for this resource S3::BucketList.

Example 3-4. S3 Ruby client: the S3::BucketList class

# The bucket list.
class BucketList
  include Authorized

  # Fetch all the buckets this user has defined.
  def get
    buckets = []

    # GET the bucket list URI and read an XML document from it.
    doc = REXML::Document.new(open(HOST).read)

    # For every bucket...
    REXML::XPath.each(doc, "//Bucket/Name") do |e|
      # ...create a new Bucket object and add it to the list.
      buckets << Bucket.new(e.text) if e.text
    end
    return buckets
  end
end

Now my file is a real web service client. If I call S3::BucketList#get I make a secure HTTP GET request to https://s3.amazonaws.com/, which happens to be the URI of the resource “a list of your buckets.” The S3 service sends back an XML document that looks something like Example 3-5. This is a representation (as I’ll start calling it in the next chapter) of the resource “a list of your buckets.” It’s just some information about the current state of that list. The Owner tag makes it clear whose bucket list it is (my AWS account name is evidently “leonardr28”), and the Buckets tag contains a number of Bucket tags describing my buckets (in this case, there’s one Bucket tag and one bucket).

Example 3-5. A sample “list of your buckets”

<?xml version='1.0' encoding='UTF-8'?>
<ListAllMyBucketsResult xmlns='http://s3.amazonaws.com/doc/2006-03-01/'>
 <Owner>
  <ID>c0363f7260f2f5fcf38d48039f4fb5cab21b060577817310be5170e7774aad70</ID>
  <DisplayName>leonardr28</DisplayName>
 </Owner>
 <Buckets>
  <Bucket>
   <Name>crummy.com</Name>
   <CreationDate>2006-10-26T18:46:45.000Z</CreationDate>
  </Bucket>
 </Buckets>
</ListAllMyBucketsResult>

For purposes of this small client application, the Name is the only aspect of a bucket I’m interested in. The XPath expression //Bucket/Name gives me the name of every bucket, which is all I need to create Bucket objects.

As we’ll see, one thing that’s missing from this XML document is links. The document gives the name of every bucket, but says nothing about where the buckets can be found on the Web. In terms of the REST design criteria, this is the major shortcoming of Amazon S3. Fortunately, it’s not too difficult to program a client to calculate a URI from the bucket name. I just follow the rule I gave earlier: https://s3.amazonaws.com/{name-of-bucket}.

The Bucket

Now, let’s write the S3::Bucket class, so that S3::BucketList.get will have something to instantiate (Example 3-6).

Example 3-6. S3 Ruby client: the S3::Bucket class

# A bucket that you've stored (or will store) on the S3 application.
class Bucket
  include Authorized
  attr_accessor :name

  def initialize(name)
    @name = name
  end

  # The URI to a bucket is the service root plus the bucket name.
  def uri
    HOST + URI.escape(name)
  end

  # Stores this bucket on S3. Analagous to ActiveRecord::Base#save,
  # which stores an object in the database. See below in the
  # book text for a discussion of acl_policy.
  def put(acl_policy=nil)
    # Set the HTTP method as an argument to open(). Also set the S3
    # access policy for this bucket, if one was provided.
    args = {:method => :put}
    args["x-amz-acl"] = acl_policy if acl_policy

    # Send a PUT request to this bucket's URI.
    open(uri, args)
    return self
  end

  # Deletes this bucket. This will fail with HTTP status code 409
  # ("Conflict") unless the bucket is empty.
  def delete
    # Send a DELETE request to this bucket's URI.
    open(uri, :method => :delete)
  end

Here are two more web service methods: S3::Bucket#put and S3::Bucket#delete. Since the URI to a bucket uniquely identifies the bucket, deletion is simple: you send a DELETE request to the bucket URI, and it’s gone. Since a bucket’s name goes into its URI, and a bucket has no other settable properties, it’s also easy to create a bucket: just send a PUT request to its URI. As I’ll show when I write S3::Object, a PUT request is more complicated when not all the data can be stored in the URI.

Earlier I compared my S3:: classes to ActiveRecord classes, but S3::Bucket#put works a little differently from an ActiveRecord implementation of save. A row in an ActiveRecord-controlled database table has a numeric unique ID. If you take an ActiveRecord object with ID 23 and change its name, your change is reflected as a change to the database record with ID 23:

SET name="newname" WHERE id=23

The permanent ID of an S3 bucket is its URI, and the URI includes the name. If you change the name of a bucket and call put, the client doesn’t rename the old bucket on S3: it creates a new, empty bucket at a new URI with the new name. This is a result of design decisions made by the S3 programmers. It doesn’t have to be this way. The Ruby on Rails framework has a different design: when it exposes database rows through a RESTful web service, the URI to a row incorporates its numeric database IDs. If S3 was a Rails service you’d see buckets at URIs like /buckets/23. Renaming the bucket wouldn’t change the URI.

Now comes the last method of S3::Bucket, which I’ve called get. Like S3::BucketList.get, this method makes a GET request to the URI of a resource (in this case, a “bucket” resource), fetches an XML document, and parses it into new instances of a Ruby class (see Example 3-7). This method supports a variety of ways to filter the contents of S3 buckets. For instance, you can use :Prefix to retrieve only objects whose keys start with a certain string. I won’t cover these filtering options in detail. If you’re interested in them, see the S3 technical documentation on “Listing Keys.”

Example 3-7. S3 Ruby client: the S3::Bucket class (concluded)

  # Get the objects in this bucket: all of them, or some subset.
  #
  # If S3 decides not to return the whole bucket/subset, the second
  # return value will be set to true. To get the rest of the objects,
  # you'll need to manipulate the subset options (not covered in the
  # book text).
  #
  # The subset options are :Prefix, :Marker, :Delimiter, :MaxKeys.
  # For details, see the S3 docs on "Listing Keys".
  def get(options={})
    # Get the base URI to this bucket, and append any subset options
    # onto the query string.
    uri = uri()
    suffix = '?'

    # For every option the user provided...
    options.each do |param, value|      
      # ...if it's one of the S3 subset options...
      if [:Prefix, :Marker, :Delimiter, :MaxKeys].member? :param
        # ...add it to the URI.
        uri << suffix << param.to_s << '=' << URI.escape(value)
        suffix = '&'
      end
    end

    # Now we've built up our URI. Make a GET request to that URI and
    # read an XML document that lists objects in the bucket.
    doc = REXML::Document.new(open(uri).read)
    there_are_more = REXML::XPath.first(doc, "//IsTruncated").text == "true"

    # Build a list of S3::Object objects.
    objects = []
    # For every object in the bucket...
    REXML::XPath.each(doc, "//Contents/Key") do |e|
      # ...build an S3::Object object and append it to the list.
      objects << Object.new(self, e.text) if e.text
    end
    return objects, there_are_more
  end
end

Make a GET request of the application’s root URI, and you get a representation of the resource “a list of your buckets.” Make a GET request to the URI of a “bucket” resource, and you get a representation of the bucket: an XML document like the one in Example 3-8, containing a Contents tag for every element of the bucket.

Example 3-8. A sample bucket representation

<?xml version='1.0' encoding='UTF-8'?>  
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
 <Name>crummy.com</Name>
 <Prefix></Prefix>
 <Marker></Marker>
 <MaxKeys>1000</MaxKeys>
 <IsTruncated>false</IsTruncated>
 <Contents>
  <Key>mydocument</Key>
  <LastModified>2006-10-27T16:01:19.000Z</LastModified>
  <ETag>"93bede57fd3818f93eedce0def329cc7"</ETag>
  <Size>22</Size>
  <Owner>
   <ID>
    c0363f7260f2f5fcf38d48039f4fb5cab21b060577817310be5170e7774aad70</ID>
    <DisplayName>leonardr28</DisplayName>
  </Owner>
  <StorageClass>STANDARD</StorageClass>
 </Contents>
</ListBucketResult>

In this case, the portion of the document I find interesting is the list of a bucket’s objects. An object is identified by its key, and I use the XPath expression “//Contents/Key” to fetch that information. I’m also interested in a certain Boolean variable (“//IsTruncated”): whether this document contains keys for every object in the bucket, or whether S3 decided there were too many to send in one document and truncated the list.

Again, the main thing missing from this representation is links. The document lists lots of information about the objects, but not their URIs. The client is expected to know how to turn an object name into that object’s URI. Fortunately, it’s not too hard to build an object’s URI, using the rule I already gave: https://s3.amazonaws.com/{name-of-bucket}/{name-of-object}.

The S3 Object

Now we’re ready to implement an interface to the core of the S3 service: the object. Remember that an S3 object is just a data string that’s been given a name (a key) and a set of metadata key-value pairs (such as Content-Type="text/html"). When you send a GET request to the bucket list, or to a bucket, S3 serves an XML document that you have to parse. When you send a GET request to an object, S3 serves whatever data string you PUT there earlier—byte for byte.

Example 3-9 shows the beginning of S3::Object, which should be nothing new by now.

Example 3-9. S3 Ruby client: the S3::Object class

# An S3 object, associated with a bucket, containing a value and metadata.
class Object
  include Authorized

  # The client can see which Bucket this Object is in.
  attr_reader :bucket
  
  # The client can read and write the name of this Object.
  attr_accessor :name

  # The client can write this Object's metadata and value.
  # I'll define the corresponding "read" methods later.
  attr_writer :metadata, :value

  def initialize(bucket, name, value=nil, metadata=nil)
    @bucket, @name, @value, @metadata = bucket, name, value, metadata
  end

  # The URI to an Object is the URI to its Bucket, and then its name.
  def uri
    @bucket.uri + '/' + URI.escape(name)
  end

What comes next is my first implementation of an HTTP HEAD request. I use it to fetch an object’s metadata key-value pairs and populate the metadata hash with it (the actual implementation of store_metadata comes at the end of this class). Since I’m using rest-open-uri, the code to make the HEAD request looks the same as the code to make any other HTTP request (see Example 3-10).

Example 3-10. S3 Ruby client: the S3::Object#metadata method

  # Retrieves the metadata hash for this Object, possibly fetching
  # it from S3.
  def metadata
    # If there's no metadata yet...
    unless @metadata
      # Make a HEAD request to this Object's URI, and read the metadata
      # from the HTTP headers in the response.
      begin
        store_metadata(open(uri, :method => :head).meta) 
      rescue OpenURI::HTTPError => e
        if e.io.status == ["404", "Not Found"]
          # If the Object doesn't exist, there's no metadata and this is not
          # an error. 
          @metadata = {}
        else
          # Otherwise, this is an error.
          raise e
        end
      end

    end
    return @metadata
  end

The goal here is to fetch an object’s metadata without fetching the object itself. This is the difference between downloading a movie review and downloading the movie, and when you’re paying for the bandwidth it’s a big difference. This distinction between metadata and representation is not unique to S3, and the solution is general to all resource-oriented web services. The HEAD method gives any client a way of fetching the metadata for any resource, without also fetching its (possibly enormous) representation.

Of course, sometimes you do want to download the movie, and for that you need a GET request. I’ve put the GET request in the accessor method S3::Object#value, in Example 3-11. Its structure mirrors that of S3::Object#metadata.

Example 3-11. S3 Ruby client: the S3::Object#value method

  # Retrieves the value of this Object, possibly fetching it
  # (along with the metadata) from S3.
  def value
    # If there's no value yet...
    unless @value
      # Make a GET request to this Object's URI.
      response = open(uri)
      # Read the metadata from the HTTP headers in the response.
      store_metadata(response.meta) unless @metadata
      # Read the value from the entity-body
      @value = response.read
    end
    return @value
  end

The client stores objects on the S3 service the same way it stores buckets: by sending a PUT request to a certain URI. The bucket PUT is trivial because a bucket has no distinguishing features other than its name, which goes into the URI of the PUT request. An object PUT is more complex. This is where the HTTP client specifies an object’s metadata (such as Content-Type) and value. This information will be made available on future HEAD and GET requests.

Fortunately, setting up the PUT request is not terribly complicated, because an object’s value is whatever the client says it is. I don’t have to wrap the object’s value in an XML document or anything. I just send the data as is, and set HTTP headers that correspond to the items of metadata in my metadata hash (see Example 3-12).

Example 3-12. S3 Ruby client: the S3::Object#put method

  # Store this Object on S3.
  def put(acl_policy=nil)

    # Start from a copy of the original metadata, or an empty hash if
    # there is no metadata yet.
    args = @metadata ? @metadata.clone : {}

    # Set the HTTP method, the entity-body, and some additional HTTP
    # headers.
    args[:method] = :put
    args["x-amz-acl"] = acl_policy if acl_policy
    if @value
      args["Content-Length"] = @value.size.to_s
      args[:body] = @value
    end

    # Make a PUT request to this Object's URI.
    open(uri, args)
    return self
  end

The S3::Object#delete implementation (see Example 3-13) is identical to S3::Bucket#delete.

Example 3-13. S3 Ruby client: the S3::Object#delete method

  # Deletes this Object.
  def delete
    # Make a DELETE request to this Object's URI.
    open(uri, :method => :delete)
  end

And Example 3-14 shows the method for turning HTTP response headers into S3 object metadata. Except for Content-Type, you should prefix all the metadata headers you set with the string “x-amz-meta-”. Otherwise they won’t make the round trip to the S3 server and back to a web service client. S3 will think they’re quirks of your client software and discard them.

Example 3-14. S3 Ruby client: the S3::Object#store_metadata method

  private

  # Given a hash of headers from a HTTP response, picks out the
  # headers that are relevant to an S3 Object, and stores them in the
  # instance variable @metadata.
  def store_metadata(new_metadata)    
    @metadata = {}
    new_metadata.each do |h,v| 
      if RELEVANT_HEADERS.member?(h) || h.index('x-amz-meta') == 0
        @metadata[h] = v  
      end
    end
  end
  RELEVANT_HEADERS = ['content-type', 'content-disposition', 'content-range',
                      'x-amz-missing-meta']
end

Request Signing and Access Control

I’ve put it off as long as I can, and now it’s time to deal with S3 authentication. If your main interest is in RESTful services in general, feel free to skip ahead to the section on using the S3 library in clients. But if the inner workings of S3 have piqued your interest, read on.

The code I’ve shown you so far makes HTTP requests all right, but S3 rejects them, because they don’t contain the all-important Authorization header. S3 has no proof that you’re the owner of your own buckets. Remember, Amazon charges you for the data stored on their servers and the bandwidth used in transferring that data. If S3 accepted requests to your buckets with no authorization, anyone could store data in your buckets and you’d get charged for it.

Most web services that require authentication use a standard HTTP mechanism to make sure you are who you claim to be. But S3’s needs are more complicated. With most web services you never want anyone else using your data. But one of the uses of S3 is as a hosting service. You might want to host a big movie file on S3, let anyone download it with their BitTorrent client, and have Amazon send you the bill.

Or you might be selling access to movie files stored on S3. Your e-commerce site takes payment from a customer and gives them an S3 URI they can use to download the movie. You’re delegating to someone else the right to make a particular web service call (a GET request) as you, and have it charged to your account.

The standard mechanisms for HTTP authentication can’t provide security for that kind of application. Normally, the person who’s sending the HTTP request needs to know the actual password. You can prevent someone from spying on your password, but you can’t say to someone else: “here’s my password, but you must promise only to use it to request this one URI.”

S3 solves this problem using a message authentication code (MAC). Every time you make an S3 request, you use your secret key (remember, the secret is shared between you and Amazon) to sign the important parts of the request. That’d be the URI, the HTTP method you’re using, and a few of the HTTP headers. Only someone who knows the secret can create these signatures for your requests, which is how Amazon knows it’s okay to charge you for the request. But once you’ve signed a request, you can send the signature to a third party without revealing the secret. The third party is then free to send an identical HTTP request to the one you signed, and have Amazon charge you for it. In short: someone else can make a specific request as you, for a limited time, without having to know your secret.

There is a simpler way to give anonymous access to your S3 objects, and I discuss it below. But there’s no way around signing your own requests, so even a simple library like this one must support request signing if it’s going to work. I’m reopening the S3::Authorized Ruby module now. I’m going to give it the ability to intercept calls to the open method, and sign HTTP requests before they’re made. Since S3::BucketList, S3::Bucket, and S3::Object have all included this module, they’ll inherit this ability as soon as I define it. Without the code I’m about to write, all those open calls I defined in the classes above will send unsigned HTTP requests that just bounce off S3 with response code 403 (“Forbidden”). With this code, you’ll be able to generate signed HTTP requests that pass through S3’s security measures (and cost you money). The code in Example 3-15 and the other examples that follow is heavily based on Amazon’s own example S3 library.

Example 3-15. S3 Ruby client: the S3::Authorized module

module Authorized
  # These are the standard HTTP headers that S3 considers interesting
  # for purposes of request signing.
  INTERESTING_HEADERS = ['content-type', 'content-md5', 'date']

  # This is the prefix for custom metadata headers. All such headers
  # are considered interesting for purposes of request signing.
  AMAZON_HEADER_PREFIX = 'x-amz-'

  # An S3-specific wrapper for rest-open-uri's implementation of
  # open(). This implementation sets some HTTP headers before making
  # the request. Most important of these is the Authorization header,
  # which contains the information Amazon will use to decide who to
  # charge for this request.
  def open(uri, headers_and_options={}, *args, &block)
    headers_and_options = headers_and_options.dup
    headers_and_options['Date'] ||= Time.now.httpdate
    headers_and_options['Content-Type'] ||= ''   
    signed = signature(uri, headers_and_options[:method] || :get,
                       headers_and_options)
    headers_and_options['Authorization'] = "AWS #{@@public_key}:#{signed}"
    Kernel::open(uri, headers_and_options, *args, &block)
  end

The tough work here is in the signature method, not yet defined. This method needs to construct an encrypted string to go into a request’s Authorization header: a string that convinces the S3 service that it’s really you sending the request—or that you’ve authorized someone else to make the request at your expense (see Example 3-16).

Example 3-16. S3 Ruby client: the Authorized#signature module

  # Builds the cryptographic signature for an HTTP request. This is
  # the signature (signed with your secret key) of a "canonical
  # string" containing all interesting information about the request.
  def signature(uri, method=:get, headers={}, expires=nil)
    # Accept the URI either as a string, or as a Ruby URI object.
    if uri.respond_to? :path
      path = uri.path
    else
      uri = URI.parse(uri)
      path = uri.path + (uri.query ? "?" + query : "")
    end

    # Build the canonical string, then sign it.
    signed_string = sign(canonical_string(method, path, headers, expires))
  end

Well, this method passes the buck again, by calling sign on the result of canonical_string. Let’s look at those two methods, starting with canonical_string. It turns an HTTP request into a string that looks something like Example 3-17. That string contains everything interesting (from S3’s point of view) about an HTTP request, in a specific format. The interesting data is the HTTP method (PUT), the Content-type (“text/plain”), a date, a few other HTTP headers (“x-amz-metadata”), and the path portion of the URI (“/crummy.com/myobject”). This is the string that sign will sign. Anyone can create this string, but only the S3 account holder and Amazon know how to produce the correct signature.

Example 3-17. The canonical string for a sample request

PUT

text/plain
Fri, 27 Oct 2006 21:22:41 GMT
x-amz-metadata:Here's some metadata for the myobject object.
/crummy.com/myobject

When Amazon’s server receives your HTTP request, it generates the canonical string, signs it (again, Amazon knows your secret key), and sees whether the two signatures match. That’s how S3 authentication works. If the signatures match, your request goes through. Otherwise, you get a response code of 403 (“Forbidden”).

Example 3-18 shows the code to generate the canonical string.

Example 3-18. S3 Ruby client: the Authorized#canonical_string method

  # Turns the elements of an HTTP request into a string that can be
  # signed to prove a request comes from your web service account.
  def canonical_string(method, path, headers, expires=nil)

    # Start out with default values for all the interesting headers.
    sign_headers = {}
    INTERESTING_HEADERS.each { |header| sign_headers[header] = '' }

    # Copy in any actual values, including values for custom S3
    # headers.
    headers.each do |header, value|
      if header.respond_to? :to_str
        header = header.downcase
        # If it's a custom header, or one Amazon thinks is interesting...
        if INTERESTING_HEADERS.member?(header) ||
            header.index(AMAZON_HEADER_PREFIX) == 0
          # Add it to the header hash.
          sign_headers[header] = value.to_s.strip
        end
      end
    end
 
    # This library eliminates the need for the x-amz-date header that
    # Amazon defines, but someone might set it anyway. If they do,
    # we'll do without HTTP's standard Date header.
    sign_headers['date'] = '' if sign_headers.has_key? 'x-amz-date'

    # If an expiration time was provided, it overrides any Date
    # header. This signature will be valid until the expiration time,
    # not only during the single second designated by the Date header.
    sign_headers['date'] = expires.to_s if expires

    # Now we start building the canonical string for this request. We
    # start with the HTTP method.
    canonical = method.to_s.upcase + "\n"

    # Sort the headers by name, and append them (or just their values)
    # to the string to be signed.
    sign_headers.sort_by { |h| h[0] }.each do |header, value|
      canonical << header << ":" if header.index(AMAZON_HEADER_PREFIX) == 0
      canonical << value << "\n"
    end

    # The final part of the string to be signed is the URI path. We
    # strip off the query string, and (if necessary) tack one of the
    # special S3 query parameters back on: 'acl', 'torrent', or
    # 'logging'.
    canonical << path.gsub(/\?.*$/, '')

    for param in ['acl', 'torrent', 'logging']
      if path =~ Regexp.new("[&?]#{param}($|&|=)")
        canonical << "?" << param
        break
      end
    end
    return canonical
  end

The implementation of sign is just a bit of plumbing around Ruby’s standard cryptographic and encoding interfaces (see Example 3-19).

Example 3-19. S3 Ruby client: the Authorized#sign method

  # Signs a string with the client's secret access key, and encodes the
  # resulting binary string into plain ASCII with base64.
  def sign(str)
    digest_generator = OpenSSL::Digest::Digest.new('sha1')
    digest = OpenSSL::HMAC.digest(digest_generator, @@private_key, str)
    return Base64.encode64(digest).strip
  end

Signing a URI

My S3 library has one feature still to be implemented. I’ve mentioned a few times that S3 lets you sign an HTTP request and give the URI to someone else, letting them make that request as you. Here’s the method that lets you do this: signed_uri (see Example 3-20). Instead of making an HTTP request with open, you pass the open arguments into this method, and it gives you a signed URI that anyone can use as you. To limit abuse, a signed URI works only for a limited time. You can customize that time by passing a Time object in as the keyword argument :expires.

Example 3-20. S3 Ruby client: the Authorized#signed_uri method

  # Given information about an HTTP request, returns a URI you can
  # give to anyone else, to let them them make that particular HTTP
  # request as you. The URI will be valid for 15 minutes, or until the
  # Time passed in as the :expires option.
  def signed_uri(headers_and_options={}) 
    expires = headers_and_options[:expires] || (Time.now.to_i + (15 * 60))
    expires = expires.to_i if expires.respond_to? :to_i
    headers_and_options.delete(:expires) 
    signature = URI.escape(signature(uri, headers_and_options[:method], 
                                     headers_and_options, nil))
    q = (uri.index("?")) ? "&" : "?"
    "#{uri}#{q}Signature=#{signature}&Expires=#{expires}&AWSAccessKeyId=#{@@public_key}"
  end
end

end # Remember the all-encompassing S3 module? This is the end.

Here’s how it works. Suppose I want to give a customer access to my hosted file at https://s3.amazonaws.com/BobProductions/KomodoDragon.avi. I can run the code in Example 3-21 to generate a URI for my customer.

Example 3-21. Generating a signed URI

#!/usr/bin/ruby1.9
# s3-signed-uri.rb
require 'S3lib'

bucket = S3::Bucket.new("BobProductions")
object = S3::Object.new(bucket, "KomodoDragon.avi")
puts object.signed_uri
# "https://s3.amazonaws.com/BobProductions/KomodoDragon.avi
# ?Signature=J%2Fu6kxT3j0zHaFXjsLbowgpzExQ%3D
# &Expires=1162156499&AWSAccessKeyId=0F9DBXKB5274JKTJ8DG2"

That URI will be valid for 15 minutes, the default for my signed_uri implementation. It incorporates my key ID (AWSAccessKeyId), the expiration time (Expires), and the cryptographic Signature. My customer can visit this URI and download the movie file KomodoDragon.avi. Amazon will charge me for my customer’s use of their bandwidth. If my customer modifies any part of the URI (maybe they to try to download a second movie too), the S3 service will reject their request. An untrustworthy customer can send the URI to all of their friends, but it will stop working in 15 minutes.

You may have noticed a problem here. The canonical string usually includes the value of the Date header. When my customer visits the URI you signed, their web browser will surely send a different value for the Date header. That’s why, when you’re generating a canonical string to give to someone else, you set an expiration date instead of a request date. Look back to Example 3-18 and the implementation of canonical_string, where the expiration date (if provided) overwrites any value for the Date header.

Setting Access Policy

What if I want to make an object publicly accessible? I want to serve my files to the world and let Amazon deal with the headaches of server management. Well, I could set an expiration date very far in the future, and give out the enormous signed URI to everyone. But there’s an easier way to get the same results: allow anonymous access. You can do this by setting the access policy for a bucket or object, telling S3 to respond to unsigned requests for it. You do this by sending the x-amz-acl header along with the PUT request that creates the bucket or object.

That’s what the acl_policy argument to Bucket#put and Object#put does. If you want to make a bucket or object publicly readable or writable, you pass an appropriate value in for acl_policy. My client sends that value as part of the custom HTTP request header X-amz-acl. Amazon S3 reads this request header and sets the rules for bucket or object access appropriately.

The client in Example 3-22 creates an S3 object that anyone can read by visiting its URI at https://s3.amazonaws.com/BobProductions/KomodoDragon-Trailer.avi. In this scenario, I’m not selling my movies: just using Amazon as a hosting service so I don’t have to serve movies from my own web site.

Example 3-22. Creating a publicly-readable object

#!/usr/bin/ruby -w
# s3-public-object.rb
require 'S3lib'

bucket = S3::Bucket.new("BobProductions")
object = S3::Object.new(bucket, "KomodoDragon-Trailer.avi")
object.put("public-read")

S3 understands four access policies:

private: The default. Only requests signed by your “private” key are accepted.

public-read: Unsigned GET requests are accepted: anyone can download an object or list a bucket.

public-write: Unsigned GET and PUT requests are accepted. Anyone can modify an object, or add objects to a bucket.

authenticated-read: Unsigned requests are rejected, but read requests can be signed by the “private” key of any S3 user, not just your own. Basically, anyone with an S3 account can download your object or list your bucket.

There are also fine-grained ways of granting access to a bucket or object, which I won’t cover. If you’re interested, see the section “Setting Access Policy with REST” in the S3 technical documentation. That section reveals a parallel universe of extra resources. Every bucket /{name-of-bucket} has a shadow resource /{name-of-bucket}?acl corresponding to that bucket’s access control rules, and every object /{name-of-bucket}/{name-of-object} has a shadow ACL resource /{name-of-bucket}/{name-of-object}?acl. By sending PUT requests to these URIs, and including XML representations of access control lists in the request entity-bodies, you can set specific permissions and limit access to particular S3 users.

Using the S3 Client Library

I’ve now shown you a Ruby client library that can access just about the full capabilities of Amazon’s S3 service. Of course, a library is useless without clients that use it. In the previous section I showed you a couple of small clients to demonstrate points about security, but now I’d like to show something a little more substantial.

Example 3-23 is a simple command-line S3 client that can create a bucket and an object, then list the contents of the bucket. This client should give you a high-level picture of how S3’s resources work together. I’ve annotated the lines of code that trigger HTTP requests, by describing the HTTP requests in comments off to the right.

Example 3-23. A sample S3 client

#!/usr/bin/ruby -w
# s3-sample-client.rb
require 'S3lib'

# Gather command-line arguments
bucket_name, object_name, object_value = ARGV
unless bucket_name
  puts "Usage: #{$0} [bucket name] [object name] [object value]"
  exit
end

# Find or create the bucket.
buckets = S3::BucketList.new.get               # GET /
bucket = buckets.detect { |b| b.name == bucket_name }
if bucket
  puts "Found bucket #{bucket_name}."
else
  puts "Could not find bucket #{bucket_name}, creating it."
  bucket = S3::Bucket.new(bucket_name)
  bucket.put                                   # PUT /{bucket}
end

# Create the object.
object = S3::Object.new(bucket, object_name)
object.metadata['content-type'] = 'text/plain'
object.value = object_value
object.put                                     # PUT /{bucket}/{object}

# For each object in the bucket...
bucket.get[0].each do |o|                      # GET /{bucket}
  # ...print out information about the object.
  puts "Name: #{o.name}"
  puts "Value: #{o.value}"                     # GET /{bucket}/{object}
  puts "Metadata hash: #{o.metadata.inspect}"
  puts
end

Clients Made Transparent with ActiveResource

Since all RESTful web services expose basically the same simple interface, it’s not a big chore to write a custom client for every web service. It is a little wasteful, though, and there are two alternatives. You can describe a service with a WADL file (introduced in the previous chapter, and covered in more detail in Chapter 9), and then access it with a generic WADL client. There’s also a Ruby library called ActiveResource that makes it trivial to write clients for certain kinds of web services.

ActiveResource is designed to run against web services that expose the rows and tables of a relational database. WADL can describe almost any kind of web service, but ActiveResource only works as a client for web services that follow certain conventions. Right now, Ruby on Rails is the only framework that follows the conventions. But any web service can answer requests from an ActiveResource client: it just has to expose its database through the same RESTful interface as Rails.

As of the time of writing, there are few publicly available web services that can be used with an ActiveResource client (I list a couple in Appendix A). To show you an example I’m going create a small Rails web service of my own. I’ll be able to drive my service with an ActiveResource client, without writing any HTTP client or XML parsing code.

Creating a Simple Service

My web service will be a simple notebook: a way of keeping timestamped notes to myself. I’ve got Rails 1.2 installed on my computer, so I can create the notebook service like this:

$ rails notebook    
$ cd notebook

I create a database on my system called notebook_development, and edit the Rails file notebook/config/database.yml to give Rails the information it needs to connect to my database. Any general guide to Rails will have more detail on these initial steps.

Now I’ve created a Rails application, but it doesn’t do anything. I’m going to generate code for a simple, RESTful web service with the scaffold generator. I want my notes to contain a timestamp and a body of text, so I run the following command:

$ ruby script/generate scaffold note date:date body:text
create  app/views/notes
create  app/views/notes/index.rhtml
create  app/views/notes/show.rhtml
create  app/views/notes/new.rhtml
create  app/views/notes/edit.rhtml
create  app/views/layouts/notes.rhtml
create  public/stylesheets/scaffold.css
create  app/models/note.rb
create  app/controllers/notes_controller.rb
create  test/functional/notes_controller_test.rb
create  app/helpers/notes_helper.rb
create  test/unit/note_test.rb
create  test/fixtures/notes.yml
create  db/migrate
create  db/migrate/001_create_notes.rb
route  map.resources :notes

Rails has generated a complete set of web service code—model, view, and controller—for my “note” object. There’s code in db/migrate/001_create_notes.rb that creates a database table called notes with three fields: a unique ID, a date (date), and a piece of text (body).

The model code in app/models/note.rb provides an ActiveResource interface to the database table. The controller code in app/controllers/notes_controller.rb exposes that interface to the world through HTTP, and the views in app/views/notes define the user interface. It adds up to a RESTful web service—not a very fancy one, but one that’s good enough for a demo or to use as a starting point.

Before starting the service I need to initialize the database:

$ rake db:migrate
== CreateNotes: migrating =====================================================
-- create_table(:notes)
   -> 0.0119s
== CreateNotes: migrated (0.0142s) ============================================

Now I can start the notebook application and start using my service:

$ script/server
=> Booting WEBrick...
=> Rails application started on http://0.0.0.0:3000
=> Ctrl-C to shutdown server; call with --help for options

An ActiveResource Client

The application I just generated is not much use except as a demo, but it demos some pretty impressive features. First, it’s both a web service and a web application. I can visit http://localhost:3000/notes in my web browser and create notes through the web interface. After a while the view of http://localhost:3000/notes might look like Figure 3-1.

Figure 3-1. The notebook web application with a few entered notes

If you’ve ever written a Rails application or seen a Rails demo, this should look familiar. But in Rails 1.2, the generated model and controller can also act as a RESTful web service. A programmed client can access it as easily as a web browser can.

Unfortunately, the ActiveResource client itself was not released along with Rails 1.2. As of the time of writing, it’s still being developed on the tip of the Rails development tree. To get the code I need to check it out from the Subversion version control repository:

$ svn co http://dev.rubyonrails.org/svn/rails/trunk activeresource_client
$ cd activeresource_client

Now I’m ready to write ActiveResource clients for the notebook’s web service. Example 3-24 is a client that creates a note, modifies it, lists the existing notes, and then deletes the note it just created.

Example 3-24. An ActiveResource client for the notebook service

#!/usr/bin/ruby -w
# activeresource-notebook-manipulation.rb

require 'activesupport/lib/active_support'
require 'activeresource/lib/active_resource'

# Define a model for the objects exposed by the site
class Note < ActiveResource::Base
  self.site = 'http://localhost:3000/'
end

def show_notes
  notes = Note.find :all                 # GET /notes.xml
  puts "I see #{notes.size} note(s):"
  notes.each do |note|
    puts " #{note.date}: #{note.body}"
  end
end

new_note = Note.new(:date => Time.now, :body => "A test note")
new_note.save                            # POST /notes.xml

new_note.body = "This note has been modified."
new_note.save                            # PUT /notes/{id}.xml

show_notes

new_note.destroy                         # DELETE /notes/{id}.xml

puts
show_notes

Example 3-25 shows the output when I run that program:

Example 3-25. A run of activeresource-notebook-manipulation.rb

I see 3 note(s):
 2006-06-05: What if I wrote a book about REST?
 2006-12-18: Pasta for lunch maybe?
 2006-12-18: This note has been modified.

I see 2 note(s):
 2006-06-05: What if I wrote a book about REST?
 2006-12-18: Pasta for lunch maybe?

If you’re familiar with ActiveRecord, the object-relational mapper that connects Rails to a database, you’ll notice that the ActiveResource interface looks almost exactly the same. Both libraries provide an object-oriented interface to a wide variety of objects, each of which exposes a uniform interface. With ActiveRecord, the objects live in a database and are exposed through SQL, with its SELECT, INSERT, UPDATE, and DELETE. With ActiveResource, they live in a Rails application and are exposed through HTTP, with its GET, POST, PUT, and DELETE.

Example 3-26 is an excerpt from the Rails server logs at the time I ran my ActiveResource client. The GET, POST, PUT, and DELETE requests correspond to the commented lines of code back in Example 3-24.

Example 3-26. The HTTP requests made by activeresource-notebook-manipulation.rb

"POST /notes.xml HTTP/1.1" 201
"PUT /notes/5.xml HTTP/1.1" 200
"GET /notes.xml HTTP/1.1" 200
"DELETE /notes/5.xml HTTP/1.1" 200 
"GET /notes.xml HTTP/1.1" 200

What’s going on in these requests? The same thing that’s going on in requests to S3: resource access through HTTP’s uniform interface. My notebook service exposes two kinds of resources:

The list of notes (/notes.xml). Compare to an S3 bucket, which is a list of objects.
A note (/notes/{id}.xml). Compare to an S3 object.

These resources expose GET, PUT, and DELETE, just like the S3 resources do. The list of notes also supports POST to create a new note. That’s a little different from S3, where objects are created with PUT, but it’s just as RESTful.

When the client runs, XML documents are transferred invisibly between client and server. They look like the documents in Example 3-27 or 3-28: simple depictions of the underlying database rows.

Example 3-27. The response entity-body from a GET request to /notes.xml

<?xml version="1.0" encoding="UTF-8"?>
<notes>
 <note>
  <body>What if I wrote a book about REST?</body>
  <date type="date">2006-06-05</date>
  <id type="integer">2</id>
 </note>
 <note>
  <body>Pasta for lunch maybe?</body>
  <date type="date">2006-12-18</date>
  <id type="integer">3</id>
 </note>
</notes>

Example 3-28. A request entity-body sent as part of a PUT request to /notes/5.xml

<?xml version="1.0" encoding="UTF-8"?>
<note>
 <body>This note has been modified.</body>
</note>

A Python Client for the Simple Service

Right now the only ActiveResource client library is the Ruby library, and Rails is the only framework that exposes ActiveResource-compatible services. But nothing’s happening here except HTTP requests that pass XML documents into certain URIs and get XML documents back. There’s no reason why a client in some other language couldn’t send those XML documents, or why some other framework couldn’t expose the same URIs.

Example 3-29 is a Python implementation of the client program from Example 3-24. It’s longer than the Ruby program, because it can’t rely on ActiveResource. It has to build its own XML documents and make its own HTTP requests, but its structure is almost exactly the same.

Example 3-29. A Python client for an ActiveResource service

#!/usr/bin/python
# activeresource-notebook-manipulation.py

from elementtree.ElementTree import Element, SubElement, tostring
from elementtree import ElementTree
import httplib2
import time

BASE = "http://localhost:3000/"
client = httplib2.Http(".cache")

def showNotes():
    headers, xml = client.request(BASE + "notes.xml")
    doc = ElementTree.fromstring(xml)
    for note in doc.findall('note'):
        print "%s: %s" % (note.find('date').text, note.find('body').text)

newNote = Element("note")
date = SubElement(newNote, "date")
date.attrib['type'] = "date"
date.text = time.strftime("%Y-%m-%d", time.localtime())
body = SubElement(newNote, "body")
body.text = "A test note"
 
headers, ignore = client.request(BASE + "notes.xml", "POST",
                                 body= tostring(newNote),
                                 headers={'content-type' : 'application/xml'})
newURI = headers['location']

modifiedBody = Element("note")
body = SubElement(modifiedBody, "body")
body.text = "This note has been modified"

client.request(newURI, "PUT",
               body=tostring(modifiedBody),
               headers={'content-type' : 'application/xml'})

showNotes()

client.request(newURI, "DELETE")

print
showNotes()

Parting Words

Because RESTful web services have simple and well-defined interfaces, it’s not difficult to clone them or swap out one implementation for another. Park Place is a Ruby application that exposes the same HTTP interface as S3. You can use Park Place to host your own version of S3. S3 libraries and client programs will work against your Park Place server just as they now do against https://s3.amazonaws.com/.

It’s also possible to clone ActiveResource. No one has done this yet, but it shouldn’t be difficult to write a general ActiveResource client for Python or any other dynamic language. In the meantime, writing a one-off client for an ActiveResource-compatible service is no more difficult than writing a client for any other RESTful service.

By now you should feel comfortable with the prospect of writing a client for any RESTful or REST-RPC hybrid service, whether it serves XML, HTML, JSON, or some mixture. It’s all just HTTP requests and document parsing.

You should also be getting a feel for what differentiates RESTful web services like S3 and Yahoo!’s search services from RPC-style and hybrid services like the Flickr and del.icio.us APIs. This is not a judgement about the service’s content, only about its architecture. In woodworking it’s important to work with the grain of the wood. The Web, too, has a grain, and a RESTful web service is one that works with it.

In the coming chapters I’ll show how you can create web services that are more like S3 and less like the del.icio.us API. This culminates in Chapter 7, which reinvents del.icio.us as a RESTful web service.

Get RESTful Web Services now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

RESTful Web Services by Leonard Richardson, Sam Ruby