jhw: baleful eye (Default)
[personal profile] jhw
After working for over a year alternating between two projects, one that uses Git for its version control and another that uses Mercurial, I have finally achieved sufficient mastery of both toolchains that I now feel comfortable defending my judgment that Mercurial is the superior of the two systems. I think Git has one glaring deficiency that makes it the inferior tool, and I hope to describe the required remedy in this weblog posting.

The tools are very similar, and many of the distinguishing differences come down to a matter of taste in my opinion. Some may consider it a deal-breaker that Mercurial expects its extensions to be written in Python, whereas Git admits extensions written in just about any language you care to imagine, but the usual approach is to write them in a shell language. That's not a deal-breaker for me. Many other differences are either consequences of that fundamental distinction, or they are cosmetic in nature. It also bothers me not at all neither that Mercurial has no index, nor that Git has the index. The difference between the Git stash and Mercurial patch queues is similarly trivial to me.

The big difference, the deal-maker for me, is in how each tool goes about meeting the fundamental requirement for any version control system: how it handles source code merging. Quite simply, Mercurial is better at merging than Git.

I need to introduce a bit of terminology here to make my point. Because the literature for Git and Mercurial use the word branch to mean crucially different things, I'm going to avoid the word here entirely so as to prevent confusion. For the concept described in the Git literature with the word branch and in the Mercurial literature with the word head, I shall use the word lineage. I shall use the word family when referring to the concept the Mercurial literature uses branch to describe, which is a name that distinguishes a related set of lineages.

Mercurial is superior to Git because it records family history in the repository, while Git does not. In every other significant respect, a Git repository stores the same information as a Mercurial repository. This is why it is possible to convert a Git repository into a Mercurial repository then back into a Git repository without losing any information. It is not possible to perform this round-trip starting with a Mercurial repository (in the general case) because the family history must be discarded in the conversion to Git. (In the conversion to Mercurial, the entire Git repository can be regarded as one monolithic family, and indeed this is how the excellent Hg-Git tool presents its Mercurial view of Git repositories.)

It turns out that having the family history recorded in the repository— and thereby copied around with clones, pushes and pulls— is really important when reviewing the history of a project. A hint of this importance shows up in the cultural difference one observes between Git and Mercurial users.

Among Git users, it's common to see people arguing vociferously that proper workflows involve judicious use of the "rebase" command to reduce the incidence of merging in the repository history. This is because Git only records the lineage of every change, not its family. When all you have to review in the history of a change is its lineage, you don't want to be distracted by a lot of merges between different lineages in the same family. In a Mercurial repository, because the family history is recorded in the repository with every changeset, the urge to keep every lineage pure from ancestor to descendant isn't quite as strong.

In any sane Git workflow, there are two different ways to join a pair of divergent lineages, "merge" and "rebase," and you'd better choose the right one at every opportunity or your whole team will lose valuable momentum dealing with their frustration with your bad version control hygiene. Always use "rebase" when the lineage in your local clone is divergent from the lineage in the upstream, i.e. more authoritative, clone. You do this so that the upstream clone can do a "fast-forward" merge when it pulls your change. It's important in Git for the merge not just to proceed without conflict; it must be a fast-forward merge in order to keep the authoritative lineage "clean" of any evidence of your divergence.

Basically, what's going on here is that Git encourages its users to adhere to a convention whereby lineage and family are equivalent concepts. This leads to an aesthetic concern for "clean history" where every merge of two or more lineages is a record of the merging of the families corresponding to the lineages. Any family with more than one lineage has a "dirty" or "unclean" history. Figuring out the family history of any change in a Git repository where developers have not strictly adhered to this policy means a lot of guesswork. Consequently, some Git repository administrators set flags that enforce this convention, which leads to further confusion among users. "Why can't I push? Oh, you mean I should have rebased instead of merging? Foo."

If you have a fetish for clean pedigrees, or you are using the Hg-Git bidirectional bridge, there is the standard "rebase" extension. It allows you to adopt a workflow that minimizes the incidence of merging between lineages in the same family. There is, however, not any compelling reason to do so: the repository retains the family history. It's easy to review which changes belong to which family whatever lineage they may have. Mercurial users therefore have no reason to be particularly diligent about maintaining "purity" of lineage histories, as Git users do.

I wrote at the outset of this article that I believe Git should be improved to remedy the deficiency I'm describing here. There are couple ways it could be done. One way would be to adopt Mercurial's style of annotating every node in the graph with a family name. Another way— perhaps a more straightforward and "git-like" way— of dealing with it would be to annotate every edge in the graph with the family name (derived from the branch name of the ancestor node in the repository where the commit occurred). You'd probably need a distinguished name for the case where the family history is lost to antiquity.

In any case, this is my argument for why Mercurial is superior to Git. You're welcome to your opinions, of course, but this one is mine. I'm open to persuasion that I'm DoingItWrong™, but it took me a long time to arrive at my judgment here, so please think through the arguments you want to make to me before you comment. Thanks.

[Note: this article has been revised for clarity since its initial publication. The original draft improperly assumed the reader has a familiarity with Mercurial "branch" semantics. Some redundant assertions have been removed.]

Re: Git & rebase

Date: 2011-06-29 05:02 am (UTC)
From: [identity profile] https://www.google.com/accounts/o8/id?id=AItOawni8VvVcImNktTYFWQtJyNBBRCkrrQdIdY
I feel like this post is from someone who was using git back in the early days, when you had to do a lot of plumbing commands yourself. Today's Git has a very similar command set for the average user's needs. The free book Pro Git gives you enough information in a single chapter to use Git very well day to day (chapter 2, if you're wondering). I feel like it's very straightforward and quite polished under most circumstances, but has power if you want to delve a little deeper. I submit that anyone who wants to compare Git to Mercurial, or point out any fault of Git, needs to have read that book from cover to cover.

Anyway, going back to what you call the single greatest failing of Git, which is not recording branch information on the commits themselves. I don't want branch information recorded. I may make a temporary branch named something ridiculous or uninformative (and I'm speaking of lightweight branches, or Mercurial's bookmark extension), implement a feature, and decide that it's worth keeping. I then merge this feature into a more "mainstream" branch (or rebase, if that's how I want to work -- both workflows are valid). I don't want my old temporary branch name to stick to those commits. It's unhelpful information. However, giving this temporary branch an important or mainstream name is not ideal, because I may not want to merge the contents of that branch into the mainstream workflow, and then those commits are floating around with seemingly important labels, when they're garbage.

I'm rambling, so here's the point -- I don't see the importance of knowing under which branch a certain commit was developed. That's useless information, in my opinion. Rather, I want to know what a single commit does. (Which should be documented in a well-written commit message) Assuming everyone is writing good commit messages, then it doesn't matter under which branch some code was written -- what matters is the code itself.

Now, I have to end with a confession: I don't know mercurial very well. Every time I try to pick it up, I miss my integrated lightweight branching. (I don't want to use an extension -- it's the most important part of my workflow, and what makes Git so awesome in my opinion, its branching model) Every time I hear someone mention heavy branching (cloning into a new directory), I shudder, thinking of the SVN branching model. I admit, though, I need to learn more about Mercurial and really try it.

But I will say this: the author has stated that he has worked on projects using both Mercurial and Git; I submit that the author has not learned to properly use a Git workflow, based on what I've read here. Please go read Pro Git (http://progit.org/book/) from cover to cover. It's not very long, and really explains a Git workflow and the power of the Git branching model. After reading this book, I submit that no one could point at Git and call it arcane or a cognitive burden -- it's progressed a lot in the last few years. If, at that point, you still miss your Mercurial branches (er, families, I guess), then by all means use Mercurial. But I love Git, and just want people to realize that it's easy to use when you have the right resources to learn from, and just encourages a different workflow from Mercurial.

Re: Git & rebase

Date: 2013-02-28 10:37 pm (UTC)
From: [identity profile] https://www.google.com/accounts/o8/id?id=AItOawmzwP5jmwisow1kYx9X2bSQvYUy-HqA3j0
First of all, I'm biased towards git. Second of all, this is a late reply, so you might already know. Just for consistency about this.
You wonder why you want to set receive.denyNonFastForwards on the server.
I wonder why this is not the default.
The point is, this flag is set on the server to ensure the server will not _lose_ history.
It is not telling the client it should rebase upon the upstream the changes. It tells the client to not push a ref which does not have the upstream ref as a parent. Think about the last sentence for a while.
The client might however have this upstream ref in a random 'lineage', so it could just have merged it, or rebased upon it. It must however have pulled it in somehow. You cannot push before you pulled it with this flag set.

Re: Git & rebase

Date: 2015-11-25 04:00 pm (UTC)
From: [personal profile] shelby3
> is why on Carlin's Green Earth there should be any good reason to use the receive.denyNonFastForwards setting on the server


"How do you prevent git push --force? (thanks to sdboyer!)

In the bare authoritative repository,

git config --system receive.denyNonFastForwards true"

Re: Git & rebase

Date: 2013-03-04 01:59 am (UTC)
From: [personal profile] mikeschinkel
One this to consider about Mercurial vs. Git is that with Mercurial most people don't have to read a book cover to cover in order to be able to use it because, in general, Mercurial is logical and intuitive and Git, is, well it's not either of those.

Re: Git & rebase

Date: 2012-05-26 11:18 am (UTC)
From: [identity profile] felipec.myopenid.com
> I assert that Git won't be improved to address the problem I'm writing about because the core developers do not perceive the issue to be a problem.

That's because there is no problem.

You are saying that the problem with git is that it doesn't have mercurial "branches". How is that a problem?

What is it _exactly_ that you cannot do in git?

Hint: you can do everything in git.


jhw: baleful eye (Default)
j h woodyatt

August 2012

121314 15161718

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 23rd, 2017 11:14 pm
Powered by Dreamwidth Studios