How I learned to stop worrying and love the rebase

Sat 20 October 2018

I've been using Git for nearly ten years now. Ten years is a long time, and I've been able to try different approaches and evaluate how effective they are in my workflow. I've also had the opportunity to teach Git to others; both to colleagues in an informal environment, and to students in the more structured environment of the Casimir graduate school. This experience has given me the chance to reflect on the Git workflow and how best to use the tool.

There's one question in particular which often comes up among people who have used Git for a while, and there never seems to really be any concensus on how to use it properly: git rebase.

What is rebase?

Let's start with a quick recap of what git rebase does for us. Let's say that we're developing a new feature on an aptly-named branch:

      ◯—◯ ← feature
     ╱
◯—◯—◯ ← master

We then pull in some changes from master, so that the histories for the master and feature branches are now divergent:

      ◯—◯ ← feature
     ╱
◯—◯—◯—◯—◯ ← master

Now, if the changes made on master were made to the same places in the same files as the changes on feature, then we know that when we finally merge our feature branch we're going to get conflicts. It's a general rule that the longer that you leave a branch un-merged, the more likely it is that you are going to get conflicts. Generally, while we're developing on feature we're going to want to incorporate the changes from master every so ofter, so that we don't have to deal with all the merge conflicts at once during the final merge. At this point we have 2 options for incorporating the changes from master:

      ◯—◯—◯ ← feature      ╮
     ╱   ╱                 │ merge
◯—◯—◯—◯—◯ ← master         ╯

          ◯—◯—◯ ← feature  ╮
         ╱                 │ rebase
◯—◯—◯—◯—◯ ← master         ╯

See what we did? Rebase allows us to "chop" the link attaching the base of the feature branch and re-attach it (re-base geddit?) to the commit where master is pointing now.

Then we add a couple more commits and merge:

      ◯—◯—◯—◯—◯ ← feature      ╮
     ╱   ╱     ╲               │ merge
◯—◯—◯—◯—◯———————◯ ← master     ╯

          ◯—◯—◯—◯—◯ ← feature  ╮
         ╱         ╲           │ rebase
◯—◯—◯—◯—◯———————————◯ ← master ╯

Using rebase in this way allows us to maintain an almost-linear history (i.e. we could always fast-forward when merging instead of creating an explicit merge commit), which makes it easier to understand what we've done.

Interactive rebase

The above usage of rebase is pretty uncontentious; you start to get divided opinions when you start talking about interactive rebase, which allows us to rewrite history in more exotic ways. For example, we can use interactive rebase to re-order commits or squash them together:

          A B C D
          ◯—◯—◯—◯ ← feature
         ╱
◯—◯—◯—◯—◯ ← master

          C' B' A+D
          ◯——◯———◯ ← feature
         ╱
◯—◯—◯—◯—◯ ← master

Developing is an inherently iterative process; your understanding of a problem evolves as you work on the solution. This means that the logical separation of ideas may not become apparent until after the fact. Git rebase can help us express the logical set of changes, rather than the (convoluted) set of changes as they actually happened.

So what's the problem?

Rebase rewrites history. Each git commit contains a pointer to the parent commit(s), so when we rebase a set of commits they won't hash to the same values as they did before the rebase, even though the changeset may be the same.

This rewriting of history makes it problematic to use rebase on branches that are also being worked on by other people, and it's the generally accepted wisdom not to use rebase with any branch that you've pushed to a remote repository (i.e. made public).

My Git workflow

When conducting scientific experiments, one will typically keep a lab book, which contains notes, observations and key results as they occur. The goal of keeping a lab book is to make sure that you don't forget what you were doing. The goal of a lab book is, however, not to communicate results to a wider community. A lab book — despite being an accurate record — requires context to understand; it is messy, and does not present information in a way that someone without the relevant context can easily understand. A scientific article — on the other hand — is designed to disseminate information to a wide audience, and to give the necessary context to understand any conclusions. When doing science, both of these ways of working are necessary: an accurate recollection of what has been done, and then a reorganisation and reinterpretation of what was done.

In my daily work I use Git as both a lab book and a scientific article. When I am developing a new feature or fixing a bug I will create a new branch, and then start experimenting; committing whenever I make incremental progress towards my goal. This incremental progress will certainly include many dead-ends and false starts, and that's fine. By committing early and committing often I can ensure that any work I do won't be lost. However, when it's time to explain to other people bwhat I've done, it's time to make sense of that history. This is when I'll go through my lab book of commits and use the power of rebase to sequence everything into logical changes. When my changes are reviewed there will typically be small fixups (refactoring, naming fixes etc.). During the review I make these changes as separate commits, which makes it easier for the reviewer to see that I have applied their suggestions. Once the reviewer is happy I do one final pass with interactive rebase to incorporate the changes into the commits where they make the most sense. I then rebase on top of the branch into which I'm merging and perform the merge using the -no-ff option (to ensure that an explicit merge commit is made).

Enforcing this strategy for merging in changes has a few nice features. Firstly, the history is essentially linear — any merges could have been "fast-forward" — which makes it easier to visualise in tools like tig or gitk. Secondly, preserving the individual commits from each merge means that anyone looking back in history can see the logical set of changes that went into implementing a particular feature or bugfix. Finally, cleaning up the commits (i.e. not merging the "lab book" into the master branch) means that anyone looking back in history will not have to sift through endless trivia to get to the meat of a changeset.