jhw: baleful eye (Default)
j h woodyatt ([personal profile] jhw) wrote2011-03-31 11:39 am
Entry tags:

More On Mercurial vs. Git (with Graphs!)

Two days ago, I posted a very wordy essay in which I tried to explain why I think Git is badly broken, and why everybody should instead use Mercurial until the Git developers fix it. Okay, I wasn't quite that harsh, but I came close.

People on Reddit complained that my written technical language is too confusing, especially because it introduces new terminology to make its point. They demanded graphs, with nodes and edges and circles and arrows and everything. So, I slaved over a hot graph editor for a few hours, and came up with the two graphs below which I hope will illustrate the problem.

Below, I've drawn a simplified graph of a Git repository change history with three currently defined branches, master, release and topic. Before you Git enthusiasts complain that it's contrived to show an unrealistically bad case of change history complexity, let me assure you this is a simplified example. I have access to a real-world Git repository where there are six currently running release branches and about forty currently running topic branches with several hundred previous branches that have now been deleted from the authoritative server.

Here's the graph. Can you tell me on which branch ab3e2afd was committed? What is the earliest change in the release branch? Where did the topic branch start?



I know. It's not fair. I didn't let you see the commit logs. Believe me, you don't want to see them. They're no help. You'd think the domesticated primates would put helpful clues in them to tell you the answers to these questions, but they never do. Sometimes, even worse, they lie.

A more clever rebuttal to my question is to ask in return, "Why do you need to know?" Let me answer that preemptively: A) I need to know which branch ab3e2afd was committed to know whether to include it in the change control review for the upcoming release; B) I need to know which change is the first change in the release branch because I'd like to start a new topic branch with that as my starting point so that I'll be as current as possible and still know that I can do a clean merge into master and release later; and C) I need to know where topic branch started so that I can gather all the patches up together and send them to a colleague for review.

Much of the craziness that drives Git users to get into heated arguments about "rebase" vs. "merge" is about trying very hard to make sure developers rewrite the history in their local clones sufficiently well that the change history graph on the authoritative shared repository doesn't look like that graph above.

Mercurial users, in most cases, don't have this bizarre urge. Here's why:



See the difference?

Every node in the graph is colored to indicate the name of its Mercurial branch. All the guesswork is gone. You know there was a branch named temp that got merged into release but not master. It's probably marked "closed" now that it isn't needed anymore.

Some of the early work on the topic branch went into temp before merging into the release branch. Later, the release branch merged back into the ongoing topic branch.

All this is possible because Mercurial stores the name of the branch in the changeset header. Git should do this too, and it doesn't. Instead, Git encourages its users to fake up a history that looks "clean" but isn't really accurate.

The saddening thing is that so many Git users seem to have a conceptual block against even recognizing there is a problem here. They embrace the need for historical revisionism that their tools force on them, and they call it a virtue.

I dread to contemplate the possibility of moving a certain large and venerable proprietary operating system source code base out of an old legacy version control system and into the hot new DVCS that all the kids are talking about. I predict I'm going to have to show this weblog posting to a lot of people.