Skip to content

More Git Building Blocks

Last week we took a deep dive into the way Git stores commit information in its database. We will expand on it by examining the effect multiple commits and branches have in the repository, so that we can have a better understanding on how Git compares to other version control systems.

We'll start by creating new branches to simulate collaboration between multiple people on the same repository. Let's say that we want to have translations of the text file we created last week to different languages, and that there are two translators working on the translation to two different languages: Spanish and German.

$ git branch german
$ git branch spanish

Let's look at what happened to our references and how Git reports that using the git branch command.

$ git branch
  german
* master
  spanish

$ find .git/refs -type f
.git/refs/heads/german
.git/refs/heads/master
.git/refs/heads/spanish

$ cat .git/refs/heads/master
9496b59087c06604c2e62f3a74f372e2840b2540

$ cat .git/refs/heads/german
9496b59087c06604c2e62f3a74f372e2840b2540

$ cat .git/refs/heads/spanish
9496b59087c06604c2e62f3a74f372e2840b2540

So, we have two new references corresponding to our new branches and all point to the same commit object that "master" was pointing to. The asterisk next to "master" in git branch's output means that "master" is the active (or checked out) branch. Any commits we create will update only that branch. To work in a different branch we first need to check it out.

$ git checkout german
Switched to branch 'german'

$ git branch
* german
  master
  spanish

Don't lose your HEAD

So, how does Git keep track of the checked out branch? Via the HEAD reference.

$ cat .git/HEAD
ref: refs/heads/german

Now we will create our translation and commit it.

$ echo "Hallo, Welt!" > hello-de.txt
$ git add hello-de.txt
$ git commit -m "Create German translation"
[german bec460c] Create German translation
1 file changed, 1 insertion(+)
create mode 100644 hello-de.txt 

Git lets us know that a new commit was created in the "german" branch, with a hash starting with "bec460c". It also shows that the number of files changed, and the number of lines changed in those files. Finally, it reminds us that the hello-de.txt file is new in the repository. Let's look at the database changes.

$ find .git/objects -type f
.git/objects/80/95a184a9b9ae8a14a0f0cde697c7f7cf1410e6
.git/objects/92/305da257dc470a104b213dddc7cd64952244b6
.git/objects/94/96b59087c06604c2e62f3a74f372e2840b2540
.git/objects/af/5626b4a114abcb82d63db7c8082c3c4756e51b
.git/objects/be/c460cdd4eef813c46d8a5438495a2282762d18
.git/objects/ec/947e3dd7a7752d078f1ed0cfde7457b21fef58

$ cat .git/refs/heads/german
bec460cdd4eef813c46d8a5438495a2282762d18

$ cat .git/refs/heads/master
9496b59087c06604c2e62f3a74f372e2840b2540

$ cat .git/refs/heads/spanish
9496b59087c06604c2e62f3a74f372e2840b2540

Three new Git objects were created, but also the reference corresponding to the active branch was updated to point to one of them: the commit we just created. If we look at the object contents we'll see the following:

$ git cat-file -p bec460c
tree 92305da257dc470a104b213dddc7cd64952244b6
parent 9496b59087c06604c2e62f3a74f372e2840b2540
author ...
committer ...

Create German translation

We have a reference to the new tree, as well as an indication about the parent commit for the newly created commit. The parent is also the commit the "german" branch was pointing to prior to the changes. This is what creates the Git history for the repository. We'll look at it in a moment, but let's first look at the new tree and blob objects.

$ git cat-file -p 92305da
100644 blob 8095a184a9b9ae8a14a0f0cde697c7f7cf1410e6    hello-de.txt
100644 blob af5626b4a114abcb82d63db7c8082c3c4756e51b    hello.txt

$ git cat-file -p 8095a1
Hallo, Welt!

So, the tree object for this new commit now has two entries instead of one: one for each file in our repository. Also, it reuses the blob object for the hello.txt file from the previous commit, as we did not change it. We can represent the current repository state graphically as follows:

Repository state after second commit

Now comes the interesting bit, as we would like to integrate our translation into the main branch, "master".

$ git checkout master
Switched to branch 'master'

$ git merge german
Updating 9496b59..bec460c
Fast-forward
hello-de.txt | 1 +
1 file changed, 1 insertion(+)
create mode 100644 hello-de.txt

Let's look at the results:

$ find .git/objects -type f
.git/objects/80/95a184a9b9ae8a14a0f0cde697c7f7cf1410e6
.git/objects/92/305da257dc470a104b213dddc7cd64952244b6
.git/objects/94/96b59087c06604c2e62f3a74f372e2840b2540
.git/objects/af/5626b4a114abcb82d63db7c8082c3c4756e51b
.git/objects/be/c460cdd4eef813c46d8a5438495a2282762d18
.git/objects/ec/947e3dd7a7752d078f1ed0cfde7457b21fef58

There are no new objects in the repository, but:

$ cat .git/refs/heads/master
bec460cdd4eef813c46d8a5438495a2282762d18

$ cat .git/refs/heads/german
bec460cdd4eef813c46d8a5438495a2282762d18

The "master" reference was simply updated to match "german". Since branches in Git are just pointers to commits and the commits themselves store the history through the "parent" links, the simplest change to make sure all the changes in "german" were included in "master" mas simply to move forward the "master" pointer, as the commit it was pointing to was a direct ancestor of the target commit. There was only one road to get from commit 9496b59 to commit bec460c.

What happens when there are forks in the road? We'll see it when we deal with the Spanish translation.

Nobody expects the Spanish Inquisition

We'll take some shortcuts here and list just the commands and their results, since the explanations are similar to the ones seen above:

$ git checkout spanish
Switched to branch 'spanish'

$ echo "Hola, mundo!" > hello-es.txt

$ git add hello-es.txt
$ git commit -m "Add Spanish translation"
[spanish 93ee64a] Add Spanish translation
    1 file changed, 1 insertion(+)
    create mode 100644 hello-es.txt

    $ find .git/objects -type f
    .git/objects/26/5d673163af28fced74e1c278d40b528d938c0f
    .git/objects/80/95a184a9b9ae8a14a0f0cde697c7f7cf1410e6
    .git/objects/8c/d9832dafe5796dc9204e6a82103cf4e47ab1e8
    .git/objects/92/305da257dc470a104b213dddc7cd64952244b6
    .git/objects/93/ee64a4c5075c0e39929f6c91705c736ac9d714
    .git/objects/94/96b59087c06604c2e62f3a74f372e2840b2540
    .git/objects/af/5626b4a114abcb82d63db7c8082c3c4756e51b
    .git/objects/be/c460cdd4eef813c46d8a5438495a2282762d18
    .git/objects/ec/947e3dd7a7752d078f1ed0cfde7457b21fef58

    $ git cat-file -p 93ee64a
    tree 8cd9832dafe5796dc9204e6a82103cf4e47ab1e8
    parent 9496b59087c06604c2e62f3a74f372e2840b2540
    author ...
    committer ...

    Add Spanish translation

    $ git cat-file -p 8cd9832
    100644 blob 265d673163af28fced74e1c278d40b528d938c0f    hello-es.txt
    100644 blob af5626b4a114abcb82d63db7c8082c3c4756e51b    hello.txt

    $ git cat-file -p 265d673
    Hola, mundo!

And the current repository status, seen graphically, would be:

Repository state after the third commit

Now, we will merge the Spanish translation into master and look at the results.

$ git checkout master
Switched to branch 'master'

$ git merge spanish
Merge made by the 'recursive' strategy.
hello-es.txt | 1 +
1 file changed, 1 insertion(+)
create mode 100644 hello-es.txt

$ cat .git/refs/heads/master
c7af8843b8297ebb8c7f51a430cccf61e297f795

$ cat .git/refs/heads/spanish
93ee64a4c5075c0e39929f6c91705c736ac9d714

$ cat .git/refs/heads/german
bec460cdd4eef813c46d8a5438495a2282762d18

This time, the branch pointers are all pointing to different commits, and we can see two new objects in the database:

$ find .git/objects -type f
.git/objects/26/5d673163af28fced74e1c278d40b528d938c0f
.git/objects/80/95a184a9b9ae8a14a0f0cde697c7f7cf1410e6
.git/objects/88/69adc063ee32f8b926d65c40ba910e759c7231
.git/objects/8c/d9832dafe5796dc9204e6a82103cf4e47ab1e8
.git/objects/92/305da257dc470a104b213dddc7cd64952244b6
.git/objects/93/ee64a4c5075c0e39929f6c91705c736ac9d714
.git/objects/94/96b59087c06604c2e62f3a74f372e2840b2540
.git/objects/af/5626b4a114abcb82d63db7c8082c3c4756e51b
.git/objects/be/c460cdd4eef813c46d8a5438495a2282762d18
.git/objects/c7/af8843b8297ebb8c7f51a430cccf61e297f795
.git/objects/ec/947e3dd7a7752d078f1ed0cfde7457b21fef58

$ git cat-file -p c7af884
tree 8869adc063ee32f8b926d65c40ba910e759c7231
parent bec460cdd4eef813c46d8a5438495a2282762d18
parent 93ee64a4c5075c0e39929f6c91705c736ac9d714
author ...
committer ...

Merge branch 'spanish'

$ git cat-file -p 8869adc
100644 blob 8095a184a9b9ae8a14a0f0cde697c7f7cf1410e6    hello-de.txt
100644 blob 265d673163af28fced74e1c278d40b528d938c0f    hello-es.txt
100644 blob af5626b4a114abcb82d63db7c8082c3c4756e51b    hello.txt

We can see that master is pointing to a commit with two parents: what is known as a merge commit. We can also see that its tree reuses all the existing blobs for the file contents.

A merge commit is what happens when multiple branches are integrated into each other, and it's not a simple fast forward, where one branch is just a number of commits ahead of the other. It allows Git to have a story that only moves in one direction. The current Git graph looks like this:

Final repository state after merge commit

Git looked at the differences between the source and target branches, and after identifying that there was no way to make this commit a fast-forward, simply created a commit reflecting a new state that included all the information from both branches. Since blobs are reused as much as possible and the changes involved different files, this new result was just the creation of a new tree, which is pretty fast.

Summary

This time we looked at what happens when branching and merging in Git, and identified two scenarios that might happen:

  1. A fast-forward merge, where the target branch points to a commit that is a direct descendant of the commit pointed to by the source branch.
  2. A merge commit, where a new commit object needs to be created, because there is no direct ancestorhip between the involved commit objects.

We also discovered that commits build the history of the Git repository by these ancestorship references.

Lastly, we found out that, for efficiency, Git reuses blob objects as much as possible. This minimizes the space used for the database and allows commit trees to be built faster.

Next week we'll look at the log files that Git creates to follow us around, and will learn about a powerful Git command which will enable us to recover information from Git's database, even if we deleted the branches that referenced those commits.

Published inTools