jhw: baleful eye (Default)
[personal profile] jhw
After working for over a year alternating between two projects, one that uses Git for its version control and another that uses Mercurial, I have finally achieved sufficient mastery of both toolchains that I now feel comfortable defending my judgment that Mercurial is the superior of the two systems. I think Git has one glaring deficiency that makes it the inferior tool, and I hope to describe the required remedy in this weblog posting.

The tools are very similar, and many of the distinguishing differences come down to a matter of taste in my opinion. Some may consider it a deal-breaker that Mercurial expects its extensions to be written in Python, whereas Git admits extensions written in just about any language you care to imagine, but the usual approach is to write them in a shell language. That's not a deal-breaker for me. Many other differences are either consequences of that fundamental distinction, or they are cosmetic in nature. It also bothers me not at all neither that Mercurial has no index, nor that Git has the index. The difference between the Git stash and Mercurial patch queues is similarly trivial to me.

The big difference, the deal-maker for me, is in how each tool goes about meeting the fundamental requirement for any version control system: how it handles source code merging. Quite simply, Mercurial is better at merging than Git.

I need to introduce a bit of terminology here to make my point. Because the literature for Git and Mercurial use the word branch to mean crucially different things, I'm going to avoid the word here entirely so as to prevent confusion. For the concept described in the Git literature with the word branch and in the Mercurial literature with the word head, I shall use the word lineage. I shall use the word family when referring to the concept the Mercurial literature uses branch to describe, which is a name that distinguishes a related set of lineages.

Mercurial is superior to Git because it records family history in the repository, while Git does not. In every other significant respect, a Git repository stores the same information as a Mercurial repository. This is why it is possible to convert a Git repository into a Mercurial repository then back into a Git repository without losing any information. It is not possible to perform this round-trip starting with a Mercurial repository (in the general case) because the family history must be discarded in the conversion to Git. (In the conversion to Mercurial, the entire Git repository can be regarded as one monolithic family, and indeed this is how the excellent Hg-Git tool presents its Mercurial view of Git repositories.)

It turns out that having the family history recorded in the repository— and thereby copied around with clones, pushes and pulls— is really important when reviewing the history of a project. A hint of this importance shows up in the cultural difference one observes between Git and Mercurial users.

Among Git users, it's common to see people arguing vociferously that proper workflows involve judicious use of the "rebase" command to reduce the incidence of merging in the repository history. This is because Git only records the lineage of every change, not its family. When all you have to review in the history of a change is its lineage, you don't want to be distracted by a lot of merges between different lineages in the same family. In a Mercurial repository, because the family history is recorded in the repository with every changeset, the urge to keep every lineage pure from ancestor to descendant isn't quite as strong.

In any sane Git workflow, there are two different ways to join a pair of divergent lineages, "merge" and "rebase," and you'd better choose the right one at every opportunity or your whole team will lose valuable momentum dealing with their frustration with your bad version control hygiene. Always use "rebase" when the lineage in your local clone is divergent from the lineage in the upstream, i.e. more authoritative, clone. You do this so that the upstream clone can do a "fast-forward" merge when it pulls your change. It's important in Git for the merge not just to proceed without conflict; it must be a fast-forward merge in order to keep the authoritative lineage "clean" of any evidence of your divergence.

Basically, what's going on here is that Git encourages its users to adhere to a convention whereby lineage and family are equivalent concepts. This leads to an aesthetic concern for "clean history" where every merge of two or more lineages is a record of the merging of the families corresponding to the lineages. Any family with more than one lineage has a "dirty" or "unclean" history. Figuring out the family history of any change in a Git repository where developers have not strictly adhered to this policy means a lot of guesswork. Consequently, some Git repository administrators set flags that enforce this convention, which leads to further confusion among users. "Why can't I push? Oh, you mean I should have rebased instead of merging? Foo."

If you have a fetish for clean pedigrees, or you are using the Hg-Git bidirectional bridge, there is the standard "rebase" extension. It allows you to adopt a workflow that minimizes the incidence of merging between lineages in the same family. There is, however, not any compelling reason to do so: the repository retains the family history. It's easy to review which changes belong to which family whatever lineage they may have. Mercurial users therefore have no reason to be particularly diligent about maintaining "purity" of lineage histories, as Git users do.

I wrote at the outset of this article that I believe Git should be improved to remedy the deficiency I'm describing here. There are couple ways it could be done. One way would be to adopt Mercurial's style of annotating every node in the graph with a family name. Another way— perhaps a more straightforward and "git-like" way— of dealing with it would be to annotate every edge in the graph with the family name (derived from the branch name of the ancestor node in the repository where the commit occurred). You'd probably need a distinguished name for the case where the family history is lost to antiquity.

In any case, this is my argument for why Mercurial is superior to Git. You're welcome to your opinions, of course, but this one is mine. I'm open to persuasion that I'm DoingItWrong™, but it took me a long time to arrive at my judgment here, so please think through the arguments you want to make to me before you comment. Thanks.

[Note: this article has been revised for clarity since its initial publication. The original draft improperly assumed the reader has a familiarity with Mercurial "branch" semantics. Some redundant assertions have been removed.]

family == branch labels

Date: 2011-04-20 07:47 pm (UTC)
ext_753161: Photo of me from 2005 reunion (Default)
From: [identity profile] jnareb.openid.pl
What you call family, and what I understand so-called named branches in Mercurial terminology (A Guide to Branching in Mercurial (http://stevelosh.com/blog/2009/08/a-guide-to-branching-in-mercurial) blog post by Steve Losh from 2009 has Branching with Named Branches (http://stevelosh.com/blog/2009/08/a-guide-to-branching-in-mercurial/#branching-with-named-branches) section) I like to call (http://stackoverflow.com/questions/1598759/git-and-mercurial-compare-and-contrast/1599930#1599930) branch labels. I think it describes the concept better.

Named branches / family / branch labels perhaps solve the issue with rebase / transplant and merge... but they have one serious disadvantage: name clashes. Your "for-john" branch might not be the same as mine "for-john" branch... and John would want to have it as "from-jhw" and "from-jn", or equivalent.

Note also that people usually don't rebase because of some notion of purity, but because either the fact that straight linear history is easier to bisect, or the fact that rebased commits would not conflict (if sending patches via email).
From: [identity profile] trsdomain.dk
Having never tried a distributed source control system I decided to try out Mercurial after reading this article. I was fairly disappointed when I found out it is not able to store an empty directory... I know there are a lot of workarounds:

- Adding the creation of the dirs to build script:
Brittle - someone will rewrite the script and forget it or deploy the project without the build script.

- Creating dummy files in the directory:
Yikes !

- Ensuring the code creates the dirs at runtime:
Doable, but having to change code, and make coworkers do it as well is not cool

The bottom line is that I should be able to take an existing project and import it to Mercurial. No "buts", "ifs" or excuses. I'm sticking to subversion for the time being (although it is not perfect either).

Git & rebase

Date: 2011-06-21 02:36 pm (UTC)
From: [identity profile] https://www.google.com/accounts/o8/id?id=AItOawllJaD1TSNJZdyl6vgYfQTMEg16W_l4gSo
I'm no expert in either system, but it's fair to point out that rebase is not a requirement in git. It's there if you want it, but under normal circumstances you're certainly not required to use it. If you don't rebase or merge with no-ff commits, you'll preserve a lot more history. The question is, at the end of the day, will that history be useful and do you want to see it? Your choice.

Bang on!

Date: 2011-08-03 08:00 am (UTC)
From: [identity profile] sidk.info
I think you've hit the nail on the head. I like to think of it this way: Git branches are just simple pointers while Mercurial branches are "lines". Every mercurial commit "belongs" to a branch while every git commit just has parent commit(s).

You really cant follow a line of development back in git especially if there are merges. Hence the whole emphasis on rebasing as you quite nicely point out.

It took me sometime to understand your article. I visited it a few months ago. At that point my knowledge of git was not sufficient to totally understand what you had written. Today was the day of the ah-hah moment!

Git creates complexity and solves it. Mercurial avoids the complexity all together.

Maybe it might be a good idea to graphically show what you've written for other people who visit this page?

Date: 2011-08-15 09:14 pm (UTC)
From: [identity profile] max630.net
There are reflogs in git. They can play a role analogous to what you are calling "family". But they exist only locally and do not move to other repositories during fetch and push.

Git have it good side

Date: 2011-12-17 09:14 am (UTC)
From: [personal profile] friendly12345
I have work with Mercurial for a while now.
I just started learning about it.
In some aspect I think that Git Concept is better than Mercurial.
- First is the Stage Area. I like this feature a lot.
- And second is about the remote. You can save all the remote to pull from and track which commit the remote branch current is. That is something currently missing in Mercurial.
- Also I like the workflow in Git with branch.
But one thing that I don't like about Git is.
Need a lot of knowledge to do a simple task.
As I'm not good at bash script.
So it is a little hard for me.
Also the tag in git is not very friendly.
It should display in the log like in hg log.
Summary, I think Git have it advantage. But still need to improve the user friendly.

Date: 2012-05-02 07:03 pm (UTC)
From: [personal profile] rich_pixley_hp
My argument is simpler, but comparable, and boils down to the same thing.

I want to be able to push. Git can't push shared branches because in the case of a potential collision, it has no recourse. The UI, "git push" is simple and the semantic is straightforward and obvious - I want my changes in that repository. Rather than do that, git simply throws up it's hands and refuses, (in the case where the destination has that branch checked out, or that branch has other changes that are not mine).

So... there's an obvious semantic, an obvious interpretation, and git doesn't do it. That's a pretty big and scarey culture shock coming from pretty much any source code control system developed in the last decade. With git, we're back to the geographic branches of clearcase multisite where each repository owns a branch and the other branches show up as read only. This doesn't scale very well, as we learned a couple of decades ago. It's a lot of extra work to manage all those potentially-automatic-but-failing merges.

The multiple hg heads are the natural conclusion to this problem of how to handle collisions in the repository. Hg can then propagate them anywhere and anyone in any repository can merge them. Not so in git. In git, you're reduced to sending random emails trying to find the person with whom you need to coordinate.

This isn't unique to hg, btw. Other systems do this as well.

Mercurial .vx. Bazaar?

Date: 2013-11-20 08:59 pm (UTC)
From: [identity profile] frgomes [launchpad.net]
Thank you for your excellent comparison between git and hg. It's definitely rare to find an article of such quality.

By any chance, do you have experience with Bazaar too?


Richard Gomes

Absolutely Correct

Date: 2015-01-07 06:19 pm (UTC)
From: [personal profile] fmccann
Bazaar out of the box and Mercurial with named branches makes tracking work extremely easy, and most of the work that git users do to "clean" their histories is just a complete waste of time compared to these workflows. Git by design discards branch history. It seems unintuitive, but the way to compensate for Git tracking less information is to lose even MORE information via rebases and squashing to "clean" the history. Bzr and hg track enough information to easily show you branch histories so you always get the level of detail you want without having to manually hack up the DAG.

See http://duckrowing.com/2013/12/26/bzr-init-a-bazaar-tutorial/
Page generated Aug. 20th, 2017 02:21 am
Powered by Dreamwidth Studios