Object Store Pictures

Let’s look at how Git’s objects fit and work together to form the complete system.

The blob object is at the bottom of the data structure; it references nothing and is referenced only by tree objects. In the figures that follow, each blob is represented by a rectangle.

Tree objects point to blobs, and possibly to other trees as well. Any given tree object might be pointed at by many different commit objects. Each tree is represented by a triangle.

A circle represents a commit. A commit points to one particular tree that is introduced into the repository by the commit.

Each tag is represented by a parallelogram. Each tag can point to at most one commit.

The branch is not a fundamental Git object, yet it plays a crucial role in naming commits. Each branch is pictured as a rounded rectangle.

Figure 4-1 captures how all the pieces fit together. This diagram shows the state of a repository after a single, initial commit added two files. Both files are in the top-level directory. Both the master branch and a tag named V1.0 point to the commit with ID 8675309.

Git objects

Figure 4-1. Git objects

Now, let’s make things a bit more complicated. Let’s leave the original two files as is, adding a new subdirectory with one file in it. The resulting object store looks like Figure 4-2.

Git objects after second commit

Figure 4-2. Git objects after second commit

As in the previous picture, the new commit has added one associated tree object to represent the total state of directory and file structure. In this case, it is the tree object with ID cafed00d.

Since the top-level directory is changed by the addition of the new subdirectory, the content of the top-level tree object has changed as well, so Git introduces a new tree, cafed00d.

However, the blobs dead23 and feeb1e didn’t change from the first commit to the second. Git realizes that the IDs haven’t changed and thus can be directly referenced and shared by the new cafed00d tree.

Pay attention to the direction of the arrows between commits. The parent commit or commits come earlier in time. Therefore, in Git’s implementation, each commit points back to its parent or parents. Many people get confused because the state of a repository is conventionally portrayed in the opposite direction: as a dataflow from the parent commit to child commits.

In Chapter 6, we extend these pictures to show how the history of a repository is built up and manipulated by various commands.

Get Version Control with Git now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.