Documents are CouchDB’s central data structure. To best understand and use CouchDB, you need to think in documents. This chapter walks you though the lifecycle of designing and saving a document. We’ll follow up by reading documents and aggregating and querying them with views. In the next section, you’ll see how CouchDB can also transform documents into other formats.
Documents are self-contained units of data. You might have heard the
term record to describe something similar. Your data is
usually made up of small native types such as integers and strings.
Documents are the first level of abstraction over these native types. They
provide some structure and logically group the primitive data. The height of
a person might be encoded as an integer (176
), but this
integer is usually part of a larger structure that contains a label
("height": 176
) and related data
({"name":"Chris", "height": 176}
).
How many data items you put into your documents depends on your application and a bit on how you want to use views (later), but generally, a document roughly corresponds to an object instance in your programming language. Are you running an online shop? You will have items and sales and comments for your items. They all make good candidates for objects and, subsequently, documents.
Documents differ subtly from garden-variety objects in that they
usually have authors and CRUD operations (create, read, update, delete).
Document-based software (like the word processors and spreadsheets of yore)
builds its storage model around saving documents so that authors get back
what they created. Similarly, in a CouchDB application you may find yourself
giving greater leeway to the presentation layer. If, instead of adding
timestamps to your data in a controller, you allow the user to control them,
you get draft status and the ability to publish articles in the future for
free (by viewing published documents using an endkey
of
now).
Validation functions are available so that you don’t have to worry about bad data causing errors in your system. Often in document-based software, the client application edits and manipulates the data, saving it back. As long as you give the user the document she asked you to save, she’ll be happy.
Say your users can comment on the item (“lovely book”); you have the option to store the comments as an array, on the item document. This makes it trivial to find the item’s comments, but, as they say, “it doesn’t scale.” A popular item could have tens of comments, or even hundreds or more.
Instead of storing a list on the item document, in this case it may be better to model comments into a collection of documents. There are patterns for accessing collections, which CouchDB makes easy. You likely want to show only 10 or 20 at a time and provide previous and next links. By handling comments as individual entities, you can group them with views. A group could be the entire collection or slices of 10 or 20, sorted by the item they apply to so that it’s easy to grab the set you need.
A rule of thumb: break up into documents everything that you will be handling separately in your application. Items are single, and comments are single, but you don’t need to break them into smaller pieces. Views are a convenient way to group your documents in meaningful ways.
Let’s go through building our example application to show you in practice how to work with documents.
The first step in designing any application (once you know what the program is for and have the user interaction nailed down) is deciding on the format it will use to represent and store data. Our example blog is written in JavaScript. A few lines back we said documents roughly represent your data objects. In this case, there is a an exact correspondence. CouchDB borrowed the JSON data format from JavaScript; this allows us to use documents directly as native objects when programming. This is really convenient and leads to fewer problems down the road (if you ever worked with an ORM system, you might know what we are hinting at).
Let’s draft a JSON format for blog posts. We know we’ll need each post to have an author, a title, and a body. We know we’d like to use document IDs to find documents so that URLs are search engine–friendly, and we’d also like to list them by creation date.
It should be pretty straightforward to see how JSON works. Curly
braces ({}
) wrap objects, and objects are key/value
lists. Keys are strings that are wrapped in double quotes
(""
). Finally, a value is a string, an integer, an
object, or an array ([]
). Keys and values are separated
by a colon (:
), and multiple keys and values by comma
(,
). That’s it. For a complete description of the JSON
format, see Appendix E.
Figure 12-1 shows a document that meets our requirements. The cool thing is we just made it up on the spot. We didn’t go and define a schema, and we didn’t define how things should look. We just created a document with whatever we needed. Now, requirements for objects change all the time during the development of an application. Coming up with a different document that meets new, evolved needs is just as easy.
Do I really look like a guy with a plan? You know what I am? I’m a dog chasing cars. I wouldn’t know what to do with one if I caught it. You know, I just do things. The mob has plans, the cops have plans, Gordon’s got plans. You know, they’re schemers. Schemers trying to control their little worlds. I’m not a schemer. I try to show the schemers how pathetic their attempts to control things really are.
—The Joker, The Dark Knight
Let’s examine the document in a little more detail. The first two
members (_id
and _rev
) are for
CouchDB’s housekeeping and act as identification for a particular instance of a document.
_id
is easy: if I store something in CouchDB, it
creates the _id
and returns it to me. I can use the
_id
to build the URL where I can get my something
back.
Note
Your document’s _id
defines the URL the
document can be found under. Say you have a database
movies
. All documents can be found somewhere under
the URL /movies
, but where exactly?
If you store a document with the _id
Jabberwocky
({"_id":"Jabberwocky"}
) into your
movies
database, it will be available under the URL
/movies/Jabberwocky
. So if you send a GET request to
/movies/Jabberwocky
, you will get back the JSON that
makes up your document
({"_id":"Jabberwocky"}
).
The _rev
(or revision ID)
describes a version of a document. Each change creates a new document
version (that again is self-contained) and updates the
_rev
. This becomes useful because, when saving a
document, you must provide an up-to-date _rev
so that
CouchDB knows you’ve been working against the latest document
version.
We touched on this in Chapter 2. The
revision ID acts as a gatekeeper for writes to a document in CouchDB’s
MVCC system. A document is a shared resource; many clients can read and
write them at the same time. To make sure two writing clients don’t step
on each other’s feet, each client must provide what it believes is the
latest revision ID of a document along with the proposed changes. If the
on-disk revision ID matches the provided _rev
, CouchDB
will accept the change. If it doesn’t, the update will be rejected. The
client should read the latest version, integrate the changes, and try
saving again.
This mechanism ensures two things: a client can only overwrite a version it knows, and it can’t trip over changes made by other clients. This works without CouchDB having to manage explicit locks on any document. This ensures that no client has to wait for another client to complete any work. Updates are serialized, so CouchDB will never attempt to write documents faster than your disk can spin, and it also means that two mutually conflicting writes can’t be written at the same time.
Now that you thoroughly understand the role of
_id
and _rev
on a document, let’s
look at everything else we’re storing.
{
"_id"
:
"Hello-Sofa"
,
"_rev"
:
"2-2143609722"
,
"type"
:
"post"
,
The first thing is the type of the document. Note that this is an
application-level parameter, not anything particular to CouchDB. The type
is just an arbitrarily named key/value pair as far as CouchDB is
concerned. For us, as we’re adding blog posts to Sofa, it has a little
deeper meaning. Sofa uses the type
field to determine
which validations to apply. It can then rely on documents of that type
being valid in the views and the user interface. This removes the need to
check for every field and nested JSON value before using it. This is
purely by convention, and you can make up your own or infer the type of a
document by its structure (“has an array with three elements”—a.k.a.
duck typing). We just thought this was easy to follow
and hope you agree.
"author"
:
"jchris"
,
"title"
:
"Hello Sofa"
,
The author
and title
fields
are set when the post is created. The title
field can
be changed, but the author
field is locked by the
validation function for security. Only the author may edit the
post.
"tags"
:
[
"example"
,
"blog post"
,
"json"
],
Sofa’s tag system just stores them as an array on the document. This kind of denormalization is a particularly good fit for CouchDB.
"format"
:
"markdown"
,
"body"
:
"some markdown text"
,
"html"
:
"<p>the html text</p>"
,
Blog posts are composed in the Markdown HTML
format to make them easy to author. The Markdown format as typed
by the user is stored in the body
field. Before the
blog post is saved, Sofa converts it to HTML in the client’s browser.
There is an interface for previewing the Markdown conversion, so you can
be sure it will display as you like.
"created_at"
:
"2009/05/25 06:10:40 +0000"
}
The created_at
field is used to order blog posts
in the Atom feed and on the HTML index page.
The first page we need to build in order to get one of these blog entries into our post is the interface for creating and editing posts.
Editing is more complex than just rendering posts for visitors to read, but that means once you’ve read this chapter, you’ll have seen most of the techniques we touch on in the other chapters.
The first thing to look at is the show function used to render the HTML page. If you haven’t already, read Chapter 8 to learn about the details of the API. We’ll just look at this code in the context of Sofa, so you can see how it all fits together.
function
(
doc
,
req
)
{
// !json templates.edit
// !json blog
// !code vendor/couchapp/path.js
// !code vendor/couchapp/template.js
Sofa’s edit page show function is very straightforward. In the
previous section, we showed the important templates and libraries we’ll
use. The important line is the !json
macro, which loads the
edit.html template from the
templates directory. These macros are run by
CouchApp, as Sofa is being deployed to CouchDB. For more information about
the macros, see Chapter 13.
// we only show html
return
template
(
templates
.
edit
,
{
doc
:
doc
,
docid
:
toJSON
((
doc
&&
doc
.
_id
)
||
null
),
blog
:
blog
,
assets
:
assetPath
(),
index
:
listPath
(
'index'
,
'recent-posts'
,{
descending
:
true
,
limit
:
8
})
});
}
The rest of the function is simple. We’re just rendering the HTML
template with data culled from the document. In the case where the
document does not yet exist, we make sure to set the
docid
to null
. This allows us to use
the same template both for creating new blog posts as well as editing
existing ones.
The only missing piece of this puzzle is the HTML that it takes to save a document like this.
In your browser, visit http://127.0.0.1:5984/blog/_design/sofa/_show/edit and, using your text editor, open the source file templates/edit.html (or view source in your browser). Everything is ready to go; all we have to do is wire up CouchDB using in-page JavaScript. See Figure 12-2.
Just like any web application, the important part of the HTML is the form for accepting edits. The edit form captures a few basic data items: the post title, the body (in Markdown format), and any tags the author would like to apply.
<!-- form to create a Post -->
<form
id=
"new-post"
action=
"new.html"
method=
"post"
>
<h1>
Create a new post</h1>
<p><label>
Title</label>
<input
type=
"text"
size=
"50"
name=
"title"
></p>
<p><label
for=
"body"
>
Body</label>
<textarea
name=
"body"
rows=
"28"
cols=
"80"
>
</textarea></p>
<p><input
id=
"preview"
type=
"button"
value=
"Preview"
/>
<input
type=
"submit"
value=
"Save →"
/></p>
</form>
We start with just a raw HTML document, containing a normal HTML form. We use JavaScript to convert user input into a JSON document and save it to CouchDB. In the spirit of focusing on CouchDB, we won’t dwell on the JavaScript here. It’s a combination of Sofa-specific application code, CouchApp’s JavaScript helpers, and jQuery for interface elements. The basic story is that it watches for the user to click “Save,” and then applies some callbacks to the document before sending it to CouchDB.
The JavaScript that drives blog post creation and editing centers
around the HTML form from Figure 12-2. The CouchApp
jQuery plug-in provides some abstraction, so we don’t have to concern
ourselves with the details of how the form is converted to a JSON document
when the user hits the submit button. $.CouchApp
also
ensures that the user is logged in and makes her information available to
the application. See Figure 12-3.
$
.
CouchApp
(
function
(
app
)
{
app
.
loggedInNow
(
function
(
login
)
{
The first thing we do is ask the CouchApp library to make sure the user is logged in. Assuming the answer is yes, we’ll proceed to set up the page as an editor. This means we apply a JavaScript event handler to the form and specify callbacks we’d like to run on the document, both when it is loaded and when it saved.
// w00t, we're logged in (according to the cookie)
$
(
"#header"
).
prepend
(
'<span id="login">'
+
login
+
'</span>'
);
// setup CouchApp document/form system, adding app-specific callbacks
var
B
=
new
Blog
(
app
);
Now that we know the user is logged in, we can render his username
at the top of the page. The variable B
is just a
shortcut to some of the Sofa-specific blog rendering code. It contains
methods for converting blog post bodies from Markdown to HTML, as well as
a few other odds and ends. We pulled these functions into
blog.js so we could keep them out of the way of main
code.
var
postForm
=
app
.
docForm
(
"form#new-post"
,
{
id
:
<%=
docid
%>
,
fields
:
[
"title"
,
"body"
,
"tags"
],
template
:
{
type
:
"post"
,
format
:
"markdown"
,
author
:
login
},
CouchApp’s app.docForm()
helper is a function to
set up and maintain a correspondence between a CouchDB document and an
HTML form. Let’s look at the first three arguments passed to it by Sofa.
The id
argument tells docForm()
where to save the document. This can be null in the case of a new
document. We set fields
to an array of form elements
that will correspond directly to JSON fields in the CouchDB document.
Finally, the template
argument is given a JavaScript
object that will be used as the starting point, in the case of a new
document. In this case, we ensure that the document has a type equal to
“post,” and that the default format is Markdown. We also set the author to
be the login name of the current user.
onLoad
:
function
(
doc
)
{
if
(
doc
.
_id
)
{
B
.
editing
(
doc
.
_id
);
$
(
'h1'
).
html
(
'Editing <a href="../post/'
+
doc
.
_id
+
'">'
+
doc
.
_id
+
'</a>'
);
$
(
'#preview'
).
before
(
'<input type="button" id="delete"
value="Delete Post"/> '
);
$
(
"#delete"
).
click
(
function
()
{
postForm
.
deleteDoc
({
success
:
function
(
resp
)
{
$
(
"h1"
).
text
(
"Deleted "
+
resp
.
id
);
$
(
'form#new-post input'
).
attr
(
'disabled'
,
true
);
}
});
return
false
;
});
}
$
(
'label[for=body]'
).
append
(
' <em>with '
+
(
doc
.
format
||
'html'
)
+
'</em>'
);
The onLoad
callback is run when the document is
loaded from CouchDB. It is useful for decorating the document before
passing it to the form, or for setting up other user interface elements.
In this case, we check to see if the document already has an ID. If it
does, that means it’s been saved, so we create a button for deleting it
and set up the callback to the delete function. It may look like a lot of
code, but it’s pretty standard for Ajax applications. If there is one
criticism to make of this section, it’s that the logic for creating the
delete button could be moved to the blog.js file so
we can keep more user-interface details out of the main flow.
},
beforeSave
:
function
(
doc
)
{
doc
.
html
=
B
.
formatBody
(
doc
.
body
,
doc
.
format
);
if
(
!
doc
.
created_at
)
{
doc
.
created_at
=
new
Date
();
}
if
(
!
doc
.
slug
)
{
doc
.
slug
=
app
.
slugifyString
(
doc
.
title
);
doc
.
_id
=
doc
.
slug
;
}
if
(
doc
.
tags
)
{
doc
.
tags
=
doc
.
tags
.
split
(
","
);
for
(
var
idx
in
doc
.
tags
)
{
doc
.
tags
[
idx
]
=
$
.
trim
(
doc
.
tags
[
idx
]);
}
}
},
The beforeSave()
callback to
docForm
is run after the user clicks the submit button.
In Sofa’s case, it manages setting the blog post’s timestamp, transforming
the title into an acceptable document ID (for prettier URLs), and
processing the document tags from a string into an array. It also runs the
Markdown-to-HTML conversion in the browser so that once the document is
saved, the rest of the application has direct access to the HTML.
success
:
function
(
resp
)
{
$
(
"#saved"
).
text
(
"Saved _rev: "
+
resp
.
rev
).
fadeIn
(
500
).
fadeOut
(
3000
);
B
.
editing
(
resp
.
id
);
}
});
The last callback we use in Sofa is the success
callback. It is fired when the document is successfully saved. In our
case, we use it to flash a message to the user that lets her know she’s
succeeded, as well as to add a link to the blog post so that when you
create a blog post for the first time, you can click through to see its
permalink page.
That’s it for the docForm()
callbacks.
$
(
"#preview"
).
click
(
function
()
{
var
doc
=
postForm
.
localDoc
();
var
html
=
B
.
formatBody
(
doc
.
body
,
doc
.
format
);
$
(
'#show-preview'
).
html
(
html
);
// scroll down
$
(
'body'
).
scrollTo
(
'#show-preview'
,
{
duration
:
500
});
});
Sofa has a function to preview blog posts before saving them. Since
this doesn’t affect how the document is saved, the code that watches for
events from the “preview” button is not applied within the
docForm()
callbacks.
},
function
()
{
app
.
go
(
'<%= assets %>/account.html#'
+
document
.
location
);
});
});
The last bit of code here is triggered when the user is not logged in. All it does is redirect him to the account page so that he can log in and try editing again.
Hopefully, you can see how the previous code will send a JSON document to CouchDB when the user clicks save. That’s great for creating a user interface, but it does nothing to protect the database from unwanted updates. This is where validation functions come into play. With a proper validation function, even a determined hacker cannot get unwanted documents into your database. Let’s look at how Sofa’s works. For more on validation functions, see Chapter 7.
function
(
newDoc
,
oldDoc
,
userCtx
)
{
// !code lib/validate.js
This line imports a library from Sofa that makes the rest of the
function much more readable. It is just a wrapper around the basic
ability to mark requests as either forbidden
or
unauthorized
. In this chapter, we’ve
concentrated on the business logic of the validation function. Just be
aware that unless you use Sofa’s validate.js,
you’ll need to work with the more primitive logic that the library
abstracts.
unchanged
(
"type"
);
unchanged
(
"author"
);
unchanged
(
"created_at"
);
These lines do just what they say. If the document’s
type
, author
, or
created_at
fields are changed, they throw an error
saying the update is forbidden. Note that these lines make no
assumptions about the content of these fields. They merely state that
updates must not change the content from one revision of the document to
the next.
if
(
newDoc
.
created_at
)
dateFormat
(
"created_at"
);
The dateFormat
helper makes sure that the date
(if one is provided) is in the format that Sofa’s views expect.
// docs with authors can only be saved by their author
// admin can author anything...
if
(
!
isAdmin
(
userCtx
)
&&
newDoc
.
author
&&
newDoc
.
author
!=
userCtx
.
name
)
{
unauthorized
(
"Only "
+
newDoc
.
author
+
" may edit this document."
);
}
If the person saving the document is an admin, let the edit proceed. Otherwise, make certain that the author and the person saving the document are the same. This ensures that authors may edit only their own posts.
// authors and admins can always delete
if
(
newDoc
.
_deleted
)
return
true
;
The next block of code will check the validity of various types of
documents. However, deletions will normally not be valid according to
those specifications, because their content is just _deleted:
true
, so we short-circut the validation function here.
if
(
newDoc
.
type
==
'post'
)
{
require
(
"created_at"
,
"author"
,
"body"
,
"html"
,
"format"
,
"title"
,
"slug"
);
assert
(
newDoc
.
slug
==
newDoc
.
_id
,
"Post slugs must be used as the _id."
)
}
}
Finally, we have the validation for the actual post document itself. Here we require the fields that are particular to the post document. Because we’ve validated that they are present, we can count on them in views and user interface code.
Let’s see how this all works together! Fill out the form with some practice data, and hit “save” to see a success response.
Figure 12-4 shows how JavaScript has used HTTP
to PUT the document to a URL constructed of the
database name plus the document ID. It also shows how the document is
just sent as a JSON string in the body of the PUT request. If you were
to GET the document URL, you’d see the same set of
JSON data, with the addition of the _rev
parameter as
applied by CouchDB.
To see the JSON version of the document you’ve saved, you can also browse to it in Futon. Visit http://127.0.0.1:5984/_utils/database.html?blog/_all_docs and you should see a document with an ID corresponding to the one you just saved. Click it to see what Sofa is sending to CouchDB.
Get CouchDB: The Definitive Guide now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.