Chapter 3. The Cassandra Data Model

In this chapter, we’ll gain an understanding of Cassandra’s design goals, data model, and some general behavior characteristics.

For developers and administrators coming from the relational world, the Cassandra data model can be very difficult to understand initially. Some terms, such as “keyspace,” are completely new, and some, such as “column,” exist in both worlds but have different meanings. It can also be confusing if you’re trying to sort through the Dynamo or Bigtable source papers, because although Cassandra may be based on them, it has its own model.

So in this chapter we start from common ground and then work through the unfamiliar terms. Then, we do some actual modeling to help understand how to bridge the gap between the relational world and the world of Cassandra.

The Relational Data Model

In a relational database, we have the database itself, which is the outermost container that might correspond to a single application. The database contains tables. Tables have names and contain one or more columns, which also have names. When we add data to a table, we specify a value for every column defined; if we don’t have a value for a particular column, we use null. This new entry adds a row to the table, which we can later read if we know the row’s unique identifier (primary key), or by using a SQL statement that expresses some criteria that row might meet. If we want to update values in the table, we can update all of the rows or just some ...

Get Cassandra: The Definitive Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.