Please stop teaching people to git pull

Mon 23 October 2017

There are a lot of Git tutorials on the web that teach people to use git pull when first teaching them about working with remote repositories and collaboration. I would like to put forward the position that this is a Bad Idea (TM), and that it is more instructive to teach people to use git fetch followed by an explicit git merge.

I understand the temptation of teaching people to just git pull, because it's a single command (rather than 2) and often it "just werks". On the other hand I get the impression that teaching people only git pull reinforces an incorrect mental model that causes a ton of confusion when there are (as there inevitably are) conflicts with the remote repository. In addition, I've noticed that often people just want to see what their collaborators have done, without necessarily incorporating those changes into their own work. Teaching the two operations separately enables this workflow; without it you have to introduce git reset just so that people can get themselves back to their previous state!

Because working with a remote repository is essentially (pedants, please contain yourselves) working with multiple branches I personally think that it is really useful to teach branches before remote repositories1. Once people have the concept of branches down, it's then a pretty small leap to "by the way, you can fetch the state of other people's branches with git fetch". You then explain that the branch shows up on your local machine as origin/whatever-branch-name, and that you shouldn't try and make commits directly on this branch because it's "owned" by origin. At this point it's probably a good idea to show what happens when the remote repository is updated by somebody else, so that there is a "fork" in the history:

      ◯—◯ ← origin/master
     ╱
◯—◯—◯—◯—◯ ← master

You can then say "ok, origin/master and master now contain different things; we need to incorporate the changes on origin/master with our ones". With that you introduce git merge, and can show the updated history after that operation:

      ◯—◯ ← origin/master
     ╱   ╲
◯—◯—◯—◯—◯—◯ ← master

then you can git push origin master and show what that does locally:

      ◯—◯
     ╱   ╲
◯—◯—◯—◯—◯—◯ ← master, origin/master

Teaching this sequence of operations, it is abundantly clear that git fetch only updates origin/master; it will never affect what you are working on right now. It's the way that you see what other people are working on, while you also continue working on your own thing. It's also clear that git merge totally affects what you're working on right now, so you'd better get yourself into a place where you're ready to have your files modified as git magically incorporates all those sweet sweet changes that your buddy just pushed.

This workflow also mitigates the common pitfall of:

$ git push
    To git-example-origin
    ! [rejected]        master -> master (fetch first)
    error: failed to push some refs to 'git-example-origin'
$ git pull
    Auto-merging 
    CONFLICT (content): Merge conflict in hello-world
    Recorded preimage for 'hello-world'
    Automatic merge failed; fix conflicts and then commit the result.

So instead of "congratulations, your code is now full of conflict markers, have fun!" you get to inspect the changes that were introduced by the remote before your try to merge them in. This means you can anticipate if there will be any problems, and know what to expect when you try to merge.

You could even imagine running git fetch periodically to keep origin up to date with any changes on the remote. This would be complete madness if you tried to do the same thing with git pull!


  1. This is, of course, tough if you are teaching a Github-centric workflow. One way around this may be to get people to initialize their local repositories by cloning, and then forget about the remote entirely until the time is right.