Chapter 4. MongoDB

When it comes to NoSQL databases, it is hard to beat the ease of use offered by MongoDB. Not only is it well documented and supported by a large and helpful community, but it is friendly to developers coming from an SQL background—many queries and a great deal of relational thinking can be directly applied from SQL to MongoDB—making it an especially attractive system for newcomers to the NoSQL world.

In relational databases, a single entity is stored in a row with a series of columns. Because entities are defined in a strict schema, every row will have the same columns. Working with entities involves comparing columns with very little overhead: all of the data is the same by design. In MongoDB there is no strictly defined schema and there are no rows containing columns—instead, every entity is stored in a document with any number of fields.

Documents provide a lot of power; you can store much more related information about each entity inside the document, even putting lists of documents inside other documents. Instead of making multiple queries to the database to get a complete set of information (as you would have to do with an SQL database), you can load entire datasets in a single operation.

Accessing Data

Not only is MongoDB friendly to developers coming from an SQL background, but its website goes out of its way to show how many SQL statements can be converted to MongoDB queries. In any database system, the end goal is always writing data—usually, persisting it to a disk—and reading it back out again.

Example 4-1 demonstrates a simple session in MongoDB. No special setup is required at any step of the way—all that was needed here was to install MongoDB and run it, then connect using the mongo client. Once connected to MongoDB, I immediately started using a database that I called newdb, but I didn’t have to do anything to set it up: all I needed to do was to start writing data.

Example 4-1. Basic MongoDB usage
> use newdb;
switched to db newdb
> book = { author: 'Jamie Munro', title: '20 Recipes for Programming PhoneGap',
published: new Date('04/03/2012') };
{
  "author" : "Jamie Munro",
  "title" : "20 Recipes for Programming PhoneGap",
  "published" : ISODate("2012-04-03T07:00:00Z")
}
> db.books.insert(book);
> db.books.find();
{ "_id" : ObjectId("5063d1d89e302eaf24b259a0"), "author" : "Jamie Munro",
  "title" : "20 Recipes for Programming PhoneGap",
  "published" : ISODate("2012-04-03T07:00:00Z") }

I created a variable named book to store information about a programming book I’ve been reading, including the author, title, and publication date. Then I created a books collection and inserted it using the db.books.insert command. Along the way I didn’t stop once to define a schema; at no point do I tell MongoDB what a book is, what data it should contain, or even what a collection of books is. MongoDB takes care of creating documents, maintaining lists, and even—as we will discuss later in this chapter—indexing and constraints.

Writing

As you’ve seen, writing data to MongoDB is extremely free form. You have a lot of flexibility because each record stored in the database is basically a JSON document and therefore parsable and usable in both a free and structured manner. You are not bound to a rigid set of columns per table as you would be in a traditional RDBMS. Building upon Example 4-1, you can add an additional book to the database without being bound to follow the structure that came before.

When I added a book in Example 4-2, I included a new field called keywords that was not present in the book added in Example 4-1. That doesn’t matter because when I later queried the list of books, both books were returned even though they didn’t have identical field names. MongoDB happily fetches “20 Recipes for Programming PhoneGap” along with “50 Tips and Tricks for MongoDB Developers” even though they aren’t structured exactly the same.

Example 4-2. Inserting a document
> book = { title: '50 Tips and Tricks for MongoDB Developers', 
author: 'Kristina Chodorow',
published: new Date('05/06/2011'),
keywords: ['design', 'implementation',
'optimization'] };
{
  "title" : "50 Tips and Tricks for MongoDB Developers",
  "author" : "Kristina Chodorow",
  "published" : ISODate("2011-05-06T07:00:00Z"),
  "keywords" : [
    "design",
    "implementation",
    "optimization"
  ]
}
> db.books.insert(book);
> db.books.find();
{ "_id" : ObjectId("5063d1d89e302eaf24b259a0"), "author" : "Jamie Munro",
  "title" : "20 Recipes for Programming PhoneGap",
  "published" : ISODate("2012-04-03T07:00:00Z") }
{ "_id" : ObjectId("5063d6909e302eaf24b259a1"),
  "title" : "50 Tips and Tricks for MongoDB Developers", 
  "author" : "Kristina Chodorow",
  "published" : ISODate("2011-05-06T07:00:00Z"),
  "keywords" : [ "design", "implementation", "optimization" ] }

As you can well imagine, there is a lot of power in being able to insert records in such a free form manner. When you are building an application against MongoDB, your program code is in control of the structure of the data you will be using. Any time you need to add a new field, record type, or even database, you will be able to do so by doing nothing more than declaring it and using it. But this flexibility and power comes with a management underside: your application will need to be able to handle old data formats as well as new ones after your application has grown for a period of time. That means you must either be very guarded about making changes at all or your application must be crafted to be resilient to data changes.

The simple scenario demonstrated in Example 4-2 is a perfect example of this. You’ve just added a keywords field to your book documents—every new book entered into the system from here on out will contain a keywords field that is exposed to a library terminal somewhere down the line. What happens when that terminal tries to read a book that is missing the keywords field? Hopefully the developer who built the interface thought of that and is able to display an empty list—or a special message—when no keywords are found.

Warning

You should always build your application logic to check for the presence of database fields before using them, otherwise you could end up with a broken application even though your database is behaving exactly as expected.

If you want to make sure all your documents have a field called keywords, you could trigger an update across the entire collection, as shown in Example 4-3.

Example 4-3. Adding a feld to all documents in a collection
> db.books.update({},{$set:{"keywords":[]}},false,true);

Example 4-3 demonstrates the use of MongoDB’s update command with all four of its parameters:

Search criteria

This parameter contains all of the search criteria that MongoDB should use to determine which records need to be modified. In this case, no criteria is given, meaning any record is a fair match for this function.

Update object

During normal operation, this parameter will contain an entire record just like the insert command in Examples 4-1 and 4-2. When presented with an object, MongoDB will save its contents over any document it found matching the search criteria from the first parameter. MongoDB also supports a number of special functions including the $set function shown here, which allows manipulation of part of a document, leaving the rest of the data intact.

Upsert

In most cases you would want to update an existing document using the update command, but there are many times when you will want to update a piece of information or create a new document if that information does not already exist in the database. The upsert pattern means “update if possible, insert otherwise.” In Example 4-3, the goal is to set a keyword field for all of the documents but not create any new records; therefore, upsert is set to false.

Multiple update

MongoDB expects you to update one record at a time, so when you need to update more than one you need to set this variable to true; otherwise, the database will stop updating after it operates on the first match. This is a useful safety valve to prevent you from accidentally trashing an entire collection because of a poorly thought out wildcard search pattern.

Example 4-4 demonstrates another useful operator: the $push command. This command allows you to add a new item to the end of an array without modifying anything else in your document. As shown here, the keyword developer is added to the “50 Tips and Tricks for MongoDB Developers” book.

Example 4-4. Updating part of a document
> db.books.update( { author: "Kristina Chodorow" },
{ "$push": { "keywords": "developer" } } );
> db.books.find();
{ "_id" : ObjectId("5063d1d89e302eaf24b259a0"), "author" : "Jamie Munro",
  "title" : "20 Recipes for Programming PhoneGap",
"published" : ISODate("2012-04-03T07:00:00Z") }
{ "_id" : ObjectId("5063d6909e302eaf24b259a1"), "author" : "Kristina Chodorow",
  "keywords" : [ "design", "implementation", "optimization", "developer" ],
  "published" : ISODate("2011-05-06T07:00:00Z"),
  "title" : "50 Tips and Tricks for MongoDB Developers" }

Querying

Querying in MongoDB is analogous to SELECT in SQL. You can not only query across fields in collections of documents, you can also use custom JavaScript functions to perform more complicated filtration on your result sets.

The earliest example in this chapter, Example 4-1, contains an extremely basic query:

db.books.find();

The find command with no parameters instructs MongoDB to find documents in the books collection without applying conditions to the search. When there are no conditions to apply to the search, MongoDB responds by returning all of the documents in the collection. In this sense, the find command with no parameters is the same as saying “find all” to the database.

If we were to express this in SQL, the query would look like this:

SELECT * FROM books;
Example 4-5. A simple field search in MongoDB
> db.books.find({author: "Jamie Munro"});
{ "_id" : ObjectId("5063d1d89e302eaf24b259a0"),
  "author" : "Jamie Munro", "keywords" : [ ],
  "published" : ISODate("2012-04-03T07:00:00Z"),
  "title" : "20 Recipes for Programming PhoneGap" }

Example 4-5 uses the find command again, but this time a specific author is specified in the criteria. This time MongoDB will search through the books collection and return all of the records whose author field matches the author field in the find command. If we were to express this in SQL, the query would look like this:

SELECT * FROM books WHERE author = Jamie Munro;

Imagine for a moment that the average document in your collection contained dozens, or even hundreds, of rows. During regular use you would not always want to get every field from the database, especially if you’re interested in only one or two bits of information at a time.

The find command in Example 4-6 has two parameters: the search criteria (empty in this case) and a desired field map. Because no arguments are supplied in the search criteria, MongoDB will once again return all of the documents in the books collection. The desired field map includes the title field, and will cause MongoDB to return only the title field.

Example 4-6. Finding specific fields from a collection
> db.books.find({}, {title:1});
{ "_id" : ObjectId("5063d6909e302eaf24b259a1"),
  "title" : "50 Tips and Tricks for MongoDB Developers" }
{ "_id" : ObjectId("5063d1d89e302eaf24b259a0"),
  "title" : "20 Recipes for Programming PhoneGap" }

Wait a minute! Why is the _id field being returned? MongoDB assumes you will need the record’s ID field in most cases, else you would not be able to uniquely name a document from your application code. If you wanted to show only the titles without the _id field, you could explicitly hide the _id field like this, using 0 to mean false:

db.books.find({}, {title:1, _id:0});

The find function as shown in Example 4-6 is analogous to this SQL query:

SELECT _id, title FROM books;

Indexes

It’s easy to be fooled into thinking your code is fast when you’re working on small datasets and querying against databases running on your own computer. Performance can suffer tremendously once your code hits a production workload and needs to serve a growing dataset to a large number of users. Although MongoDB is smart about where it looks for data, unless you set up indexes to keep search fields in memory, your database will be doing a lot more work than it needs to.

While going deeply into detail on the explain command would (and does!) fill an entire book, the first piece of information you should be looking for is in the cursor field. MongoDB uses either a BasicCursor or BtreeCursor when scanning collections of documents; for a heavily-used query, you want to avoid using a BasicCursor because it scans through every document in the collection to find a result.

The books collection queries in Example 4-7 is tiny, having only two records. But because a BasicCursor is used to perform the search, MongoDB has to examine both records before it can return a result set. You can see the number of scanned objects in the nscannedObjects field from the explain function’s output.

Example 4-7. Diagnosing slow queries
> db.books.find({author: "Jamie Munro"}).explain();
{
  "cursor" : "BasicCursor",
  "nscanned" : 2,
  "nscannedObjects" : 2,
  "n" : 1,
  "millis" : 0,
  "nYields" : 0,
  "nChunkSkips" : 0,
  "isMultiKey" : false,
  "indexOnly" : false,
  "indexBounds" : {
    
  }
}

You use the ensureIndex function to add an index to your collection. Indexed fields are tracked in memory and can be retrieved much more rapidly by MongoDB. More importantly, they act as a filter on queried data; the database only needs to look at records that it knows match your search criteria based upon its knowledge of the table as held in the indexes.

Notice how nscannedObjects dropped to just 1 in Example 4-8 after an index was added to the author field of the books collection. Because the author is now stored in memory and known to the database, MongoDB knows it only needs to look more closely at a single document. If you try to search for an author who doesn’t exist, MongoDB will not have to check any records at all.

Example 4-8. Adding an index to a collection
> db.books.ensureIndex({author:1});
> db.books.find({author: "Jamie Munro"}).explain();
{
  "cursor" : "BtreeCursor author_1",
  "nscanned" : 1,
  "nscannedObjects" : 1,
  "n" : 1,
  "millis" : 0,
  "nYields" : 0,
  "nChunkSkips" : 0,
  "isMultiKey" : false,
  "indexOnly" : false,
  "indexBounds" : {
  "author" : [
    [
      "Jamie Munro",
      "Jamie Munro"
    ]
  ]
  }
}

It can take some time to make sure you have all of the right indexes set up in your database. If done properly, it can mean the difference between queries that take minutes versus queries that take seconds or less.

MapReduce

MapReduce is used to batch process huge amounts of data often across clusters of database servers. If you were to compare MongoDB to a traditional SQL-based server, MapReduce would fit in the space where you would normally use GROUP BY to collect aggregate results.

A MapReduce operation involves two phases: the map phase, which plucks out the relevant data into key/value pairs for aggregation, and the reduce phase, which collects all of the keys and performs math on their values. Take a counting operation for example: if you wanted to know how many books each author in the books collection has written, you would need to go through each document and count the number of times each author appeared in the collection.

Since there have been only two records in the books collection so far, the first thing that needs to happen in Example 4-9 is the creation of more data, so that’s what happens.

Example 4-9. Performing a MapReduce query on the books collection
> db.books.insert({"author": "Kristina Chodorow", "title": "Scaling MongoDB",
... "published": new Date("03/02/2011")});
> db.books.insert({"author": "Stoyan Stefanov", "title": "JavaScript Patterns",
... "published": new Date("09/28/2010")});
> db.books.insert({"author": "Stoyan Stefanov", 
... "title": "JavaScript for PHP Developers",
... "published": new Date("10/22/2012")});
> db.books.insert({"author": "Stoyan Stefanov", "title": "Web Performance Daybook",
... "published": new Date("06/27/2012")});
> db.books.insert({"author": "Jamie Munro", 
... "title": "20 Recipes for Programming MVC 3",
... "published": new Date("10/11/2011")});


> map = function() { emit( this.author, 1 ); };
function () {
    emit(this.author, 1);
}
> reduce = function( key, values ) {
...   var total = 0;
...   values.forEach(function(value) { 
...     total+=value;
...   });
...   return total;
... }
function (key, values) {
    var total = 0;
    values.forEach(function (value) {total += value;});
    return total;
}
> db.books.mapReduce(map,reduce, { out: "bookoutput"});
{
  "result" : "bookoutput",
  "timeMillis" : 3,
  "counts" : {
    "input" : 7,
    "emit" : 7,
    "reduce" : 3,
    "output" : 3
  },
  "ok" : 1,
}
> db.bookoutput.find();
{ "_id" : "Jamie Munro", "value" : 2 }
{ "_id" : "Kristina Chodorow", "value" : 2 }
{ "_id" : "Stoyan Stefanov", "value" : 3 }

With some data in the collection, create a map function that will emit an author name and the number 1 when given a document. emit is a MongoDB helper function that groups objects by keys; in this case, the key is the author name found in each document. I used the value 1 for convenience: each time MongoDB reads a book object, it will count as 1 book credit toward that author. This could be simplified to emit an empty value, but it’s being left this way for convenience because most of the MapReduce functions you create will start with this format and become more complex.

Once all of the keys have been emitted, MongoDB collects the results and reduces them; so while any particular key may have been found multiple times if any author wrote more than one book, the final result should contain only one value—the sum of written books—per author. The reduce function in Example 4-9 adds all of the values found for each author and returns a total count of each books.

When the mapReduce function is performed on the books collection, it is given three parameters: the user-defined map function, the user-defined reduce function, and the name of a new collection to contain the results. After the mapReduce function executes, it will save the author book counts into a new collection called bookoutput, which can then be queried just like any other collection. Example 4-9 concludes by querying it to reveal the number of books next to each author’s name.

Working with Node.js

The primary MongoDB driver supported for Node.js is the Node MongoDB Native Project, a pure-JavaScript driver that provides asynchronous I/O to MongoDB from Node. Because the driver can save your JavaScript objects directly into MongoDB, by all rights you could build out the full application with it.

The Mongoose project extends the native drivers by providing a means to define the database schema. If this seems to go against the NoSQL “schema-less” philosophy, don’t worry. Changes to the JavaScript schema definitions do not require special processing by MongoDB—the schema only exists to make your life easier as a developer trying to make a consistent application.

Mongoose also provides a powerful set of middleware designed to ease the process of working with serial and parallel requests in Node’s asynchronous environment:

npm install mongoose

Concurrent Access

Picture this: Adam and Greg access the same document and begin making changes. Since they are each working on their own computers, their changes are not being saved directly back to the database and refreshed in each other’s work—they can be said to be editing in “offline” mode. Adam finishes his work first and saves his complete changes back to the database. At some later point, Greg finishes his own edits and saves those into the database. Because Greg started working on the document before Adam’s changes went into effect, his edits do not include Adam’s; so when he saves his work back into the database, Adam’s work is effectively erased, as demonstrated in Figure 4-1.

What to do? While most of the examples in this book involve write-only transactions (meaning we will write but never modify certain data), there will inevitably be occasions where you will need to work on a shared document at the same time as someone else and want to prevent your users from losing their changes if they were unfortunate enough to post first.

What happens when two users change the same document
Figure 4-1. What happens when two users change the same document

One way to accomplish this is by assigning a signature to every record and updating that signature every time you write to the database. If updating an existing document, only update the document whose ID and signature both match the values observed when the document was first read. This way, when two users try to save the same data, the first user’s update will cause the signature to change and the second user’s update will fail because the signature does not match.

In Figure 4-2, users Greg and Adam both begin editing the same document, but when Adam saves his changes he updates the document’s version number from a5 to b7. Now, when Greg saves his changes, he specifies that he is updating version a5, which no longer exists in the database; instead of writing his changes, the update fails. From here, Greg can update his copy of the document using the changes submitted by Adam, or the software he is using can do it intelligently in the background and resubmit on his behalf.

Updating documents using findAndModify
Figure 4-2. Updating documents using findAndModify

MongoDB includes a function called findAndModify, which handles searching and updating in a single function. This adds an extra layer of security because it ensures the update happens immediately when a record is found, rather than adding round-trip time to process the record on the client side and save it back to the database, during which time someone else might change the record and cause the concurrent access problem described earlier.

Example 4-10 demonstrates how an article can be created and updated using the findAndModify method. Notice how the first time findAndModify is executed, it returns the contents of the article without the updated revision number—that is, it indicates the revision is a5 instead of the new value b7 provided in the command. If you wanted to display the new, updated version of the article, you would include the keyword new in the findAndModify command. In either case, when the find command is later issued, the new revision value is shown. Next, when the findAndModify command is rerun, Mongo is unable to find the record because the revision version is no longer a5, therefore it returns null.

Example 4-10. Using findAndModify from the console
> db.articles.save( {
... title: "Jolly Roger",
... published: "September 12, 2007",
... description: "A riveting tale of suspense and drama.",
... revision: "a5"
... } );

> db.articles.find();
{
  "_id" : ObjectId("505a9b4fd5f42989fe6d8015"),
  "title" : "Jolly Roger",
  "published" : "September 12, 2007",
  "description" : "A riveting tale of suspense and drama.",
  "revision" : "a5"
}

> db.articles.findAndModify({
... query: {"_id": ObjectId("505a9b4fd5f42989fe6d8015"), revision: "a5"},
... update: {$set: {revision: "b7"} }
... });
{
  "_id" : ObjectId("505a9b4fd5f42989fe6d8015"),
  "title" : "Jolly Roger",
  "published" : "September 12, 2007",
  "description" : "A riveting tale of suspense and drama.",
  "revision" : "a5"
}

> db.articles.find();
{
  "_id" : ObjectId("505a9b4fd5f42989fe6d8015"),
  "title" : "Jolly Roger",
  "published" : "September 12, 2007",
  "description" :
  "A riveting tale of suspense and drama.",
  "revision" : "b7"
}

> db.articles.findAndModify({
... query: {"_id": ObjectId("505a9b4fd5f42989fe6d8015"), revision: "a5"},
... update: {$set: {revision: "90jasv"} }
... });
null

> db.articles.find();
{
  "_id" : ObjectId("505a9b4fd5f42989fe6d8015"),
  "title" : "Jolly Roger",
  "published" : "September 12, 2007",
  "description" : "A riveting tale of suspense and drama.",
  "revision" : "b7"
}

>

Get Building Node Applications with MongoDB and Backbone now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.