jhw: baleful eye (Default)
[personal profile] jhw
After working for over a year alternating between two projects, one that uses Git for its version control and another that uses Mercurial, I have finally achieved sufficient mastery of both toolchains that I now feel comfortable defending my judgment that Mercurial is the superior of the two systems. I think Git has one glaring deficiency that makes it the inferior tool, and I hope to describe the required remedy in this weblog posting.

The tools are very similar, and many of the distinguishing differences come down to a matter of taste in my opinion. Some may consider it a deal-breaker that Mercurial expects its extensions to be written in Python, whereas Git admits extensions written in just about any language you care to imagine, but the usual approach is to write them in a shell language. That's not a deal-breaker for me. Many other differences are either consequences of that fundamental distinction, or they are cosmetic in nature. It also bothers me not at all neither that Mercurial has no index, nor that Git has the index. The difference between the Git stash and Mercurial patch queues is similarly trivial to me.

The big difference, the deal-maker for me, is in how each tool goes about meeting the fundamental requirement for any version control system: how it handles source code merging. Quite simply, Mercurial is better at merging than Git.

I need to introduce a bit of terminology here to make my point. Because the literature for Git and Mercurial use the word branch to mean crucially different things, I'm going to avoid the word here entirely so as to prevent confusion. For the concept described in the Git literature with the word branch and in the Mercurial literature with the word head, I shall use the word lineage. I shall use the word family when referring to the concept the Mercurial literature uses branch to describe, which is a name that distinguishes a related set of lineages.

Mercurial is superior to Git because it records family history in the repository, while Git does not. In every other significant respect, a Git repository stores the same information as a Mercurial repository. This is why it is possible to convert a Git repository into a Mercurial repository then back into a Git repository without losing any information. It is not possible to perform this round-trip starting with a Mercurial repository (in the general case) because the family history must be discarded in the conversion to Git. (In the conversion to Mercurial, the entire Git repository can be regarded as one monolithic family, and indeed this is how the excellent Hg-Git tool presents its Mercurial view of Git repositories.)

It turns out that having the family history recorded in the repository— and thereby copied around with clones, pushes and pulls— is really important when reviewing the history of a project. A hint of this importance shows up in the cultural difference one observes between Git and Mercurial users.

Among Git users, it's common to see people arguing vociferously that proper workflows involve judicious use of the "rebase" command to reduce the incidence of merging in the repository history. This is because Git only records the lineage of every change, not its family. When all you have to review in the history of a change is its lineage, you don't want to be distracted by a lot of merges between different lineages in the same family. In a Mercurial repository, because the family history is recorded in the repository with every changeset, the urge to keep every lineage pure from ancestor to descendant isn't quite as strong.

In any sane Git workflow, there are two different ways to join a pair of divergent lineages, "merge" and "rebase," and you'd better choose the right one at every opportunity or your whole team will lose valuable momentum dealing with their frustration with your bad version control hygiene. Always use "rebase" when the lineage in your local clone is divergent from the lineage in the upstream, i.e. more authoritative, clone. You do this so that the upstream clone can do a "fast-forward" merge when it pulls your change. It's important in Git for the merge not just to proceed without conflict; it must be a fast-forward merge in order to keep the authoritative lineage "clean" of any evidence of your divergence.

Basically, what's going on here is that Git encourages its users to adhere to a convention whereby lineage and family are equivalent concepts. This leads to an aesthetic concern for "clean history" where every merge of two or more lineages is a record of the merging of the families corresponding to the lineages. Any family with more than one lineage has a "dirty" or "unclean" history. Figuring out the family history of any change in a Git repository where developers have not strictly adhered to this policy means a lot of guesswork. Consequently, some Git repository administrators set flags that enforce this convention, which leads to further confusion among users. "Why can't I push? Oh, you mean I should have rebased instead of merging? Foo."

If you have a fetish for clean pedigrees, or you are using the Hg-Git bidirectional bridge, there is the standard "rebase" extension. It allows you to adopt a workflow that minimizes the incidence of merging between lineages in the same family. There is, however, not any compelling reason to do so: the repository retains the family history. It's easy to review which changes belong to which family whatever lineage they may have. Mercurial users therefore have no reason to be particularly diligent about maintaining "purity" of lineage histories, as Git users do.

I wrote at the outset of this article that I believe Git should be improved to remedy the deficiency I'm describing here. There are couple ways it could be done. One way would be to adopt Mercurial's style of annotating every node in the graph with a family name. Another way— perhaps a more straightforward and "git-like" way— of dealing with it would be to annotate every edge in the graph with the family name (derived from the branch name of the ancestor node in the repository where the commit occurred). You'd probably need a distinguished name for the case where the family history is lost to antiquity.

In any case, this is my argument for why Mercurial is superior to Git. You're welcome to your opinions, of course, but this one is mine. I'm open to persuasion that I'm DoingItWrong™, but it took me a long time to arrive at my judgment here, so please think through the arguments you want to make to me before you comment. Thanks.

[Note: this article has been revised for clarity since its initial publication. The original draft improperly assumed the reader has a familiarity with Mercurial "branch" semantics. Some redundant assertions have been removed.]

family == branch labels

Date: 2011-04-20 07:47 pm (UTC)
ext_753161: Photo of me from 2005 reunion (Default)
From: [identity profile] jnareb.openid.pl
What you call family, and what I understand so-called named branches in Mercurial terminology (A Guide to Branching in Mercurial (http://stevelosh.com/blog/2009/08/a-guide-to-branching-in-mercurial) blog post by Steve Losh from 2009 has Branching with Named Branches (http://stevelosh.com/blog/2009/08/a-guide-to-branching-in-mercurial/#branching-with-named-branches) section) I like to call (http://stackoverflow.com/questions/1598759/git-and-mercurial-compare-and-contrast/1599930#1599930) branch labels. I think it describes the concept better.

Named branches / family / branch labels perhaps solve the issue with rebase / transplant and merge... but they have one serious disadvantage: name clashes. Your "for-john" branch might not be the same as mine "for-john" branch... and John would want to have it as "from-jhw" and "from-jn", or equivalent.

Note also that people usually don't rebase because of some notion of purity, but because either the fact that straight linear history is easier to bisect, or the fact that rebased commits would not conflict (if sending patches via email).

Re: family == branch labels

Date: 2011-04-20 08:13 pm (UTC)
ext_753161: Photo of me from 2005 reunion (Default)
From: [identity profile] jnareb.openid.pl
I'm sorry, I was not clear enough. By "name clashes" I mean that one branch label / family name ('for-john' in john's repository from two different repositories) might contain disconnected and unrelated commits.

BTW. an equivalent to MQ in Git is not rebase / interactive rebase, but tools such as StGit or Guilt.

Re: family == branch labels

Date: 2011-04-23 04:27 pm (UTC)
ext_753161: Photo of me from 2005 reunion (Default)
From: [identity profile] jnareb.openid.pl
Centralized policy for a distributed version control system? If I am to rely on branch naming policy, then Subversion is just as good, at least with respect to creating branches...

Besides with "named branches" / family names / branch labels you have to come with good name for a branch upfront (or rewrite history). With Git I could be working e.g. on branch 'subsystem' in my private working repository, then push this branch (perhaps after rebase and cleanup) into branch 'subsystem-feature' to my public bare publishing repository. From there maintainer can fetch it into e.g. 'jn/feature' branch in his/her repository. Note that bookmark extension (lightweight branches) has similar problem as "named" branches: they are for some time transferrable, but to avoid difficulty with mapping branch names (Git's "refspecs") 'bookmark' branch names are global.

BTW for proper merging you should need only 3 versions: ours (current branch you are merging into), theirs (branch being merged) and ancestor (merge base), for 3-way merge. All history, including family history, is irrelevant...
From: [identity profile] trsdomain.dk
Having never tried a distributed source control system I decided to try out Mercurial after reading this article. I was fairly disappointed when I found out it is not able to store an empty directory... I know there are a lot of workarounds:

- Adding the creation of the dirs to build script:
Brittle - someone will rewrite the script and forget it or deploy the project without the build script.

- Creating dummy files in the directory:
Yikes !

- Ensuring the code creates the dirs at runtime:
Doable, but having to change code, and make coworkers do it as well is not cool

The bottom line is that I should be able to take an existing project and import it to Mercurial. No "buts", "ifs" or excuses. I'm sticking to subversion for the time being (although it is not perfect either).

Git & rebase

Date: 2011-06-21 02:36 pm (UTC)
From: [identity profile] https://www.google.com/accounts/o8/id?id=AItOawllJaD1TSNJZdyl6vgYfQTMEg16W_l4gSo
I'm no expert in either system, but it's fair to point out that rebase is not a requirement in git. It's there if you want it, but under normal circumstances you're certainly not required to use it. If you don't rebase or merge with no-ff commits, you'll preserve a lot more history. The question is, at the end of the day, will that history be useful and do you want to see it? Your choice.

Re: Git & rebase

Date: 2011-06-21 06:16 pm (UTC)
From: [identity profile] https://www.google.com/accounts/o8/id?id=AItOawllJaD1TSNJZdyl6vgYfQTMEg16W_l4gSo
The assertion that git won't be improved is silly. Git has undergone tremendous improvement to get to where it is today. So has Mercurial.

I'm sure there are people who can argue why Mercurial's way of doing it is bad or the choices available in git are better. Unfortunately, that's not me.

The question that hopefully will come out of this discussion is, why do others not see this as such a glaring omission that it needs to be fixed immediately?

Re: Git & rebase

Date: 2011-06-29 05:02 am (UTC)
From: [identity profile] https://www.google.com/accounts/o8/id?id=AItOawni8VvVcImNktTYFWQtJyNBBRCkrrQdIdY
I feel like this post is from someone who was using git back in the early days, when you had to do a lot of plumbing commands yourself. Today's Git has a very similar command set for the average user's needs. The free book Pro Git gives you enough information in a single chapter to use Git very well day to day (chapter 2, if you're wondering). I feel like it's very straightforward and quite polished under most circumstances, but has power if you want to delve a little deeper. I submit that anyone who wants to compare Git to Mercurial, or point out any fault of Git, needs to have read that book from cover to cover.

Anyway, going back to what you call the single greatest failing of Git, which is not recording branch information on the commits themselves. I don't want branch information recorded. I may make a temporary branch named something ridiculous or uninformative (and I'm speaking of lightweight branches, or Mercurial's bookmark extension), implement a feature, and decide that it's worth keeping. I then merge this feature into a more "mainstream" branch (or rebase, if that's how I want to work -- both workflows are valid). I don't want my old temporary branch name to stick to those commits. It's unhelpful information. However, giving this temporary branch an important or mainstream name is not ideal, because I may not want to merge the contents of that branch into the mainstream workflow, and then those commits are floating around with seemingly important labels, when they're garbage.

I'm rambling, so here's the point -- I don't see the importance of knowing under which branch a certain commit was developed. That's useless information, in my opinion. Rather, I want to know what a single commit does. (Which should be documented in a well-written commit message) Assuming everyone is writing good commit messages, then it doesn't matter under which branch some code was written -- what matters is the code itself.

Now, I have to end with a confession: I don't know mercurial very well. Every time I try to pick it up, I miss my integrated lightweight branching. (I don't want to use an extension -- it's the most important part of my workflow, and what makes Git so awesome in my opinion, its branching model) Every time I hear someone mention heavy branching (cloning into a new directory), I shudder, thinking of the SVN branching model. I admit, though, I need to learn more about Mercurial and really try it.

But I will say this: the author has stated that he has worked on projects using both Mercurial and Git; I submit that the author has not learned to properly use a Git workflow, based on what I've read here. Please go read Pro Git (http://progit.org/book/) from cover to cover. It's not very long, and really explains a Git workflow and the power of the Git branching model. After reading this book, I submit that no one could point at Git and call it arcane or a cognitive burden -- it's progressed a lot in the last few years. If, at that point, you still miss your Mercurial branches (er, families, I guess), then by all means use Mercurial. But I love Git, and just want people to realize that it's easy to use when you have the right resources to learn from, and just encourages a different workflow from Mercurial.


jhw: baleful eye (Default)
j h woodyatt

August 2012

121314 15161718

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 8th, 2016 08:52 pm
Powered by Dreamwidth Studios