Storj's integration of Gerrit and GitHub

Toyoo · April 5, 2023, 9:30pm

I’m looking for options to suggest Gerrit as an alternative to GitHub in my workplace. I know Storj uses both. May I ask for your—that is, Storj Inc.'s experience in integrating the two, and in general the overall opinions you have on those tools, your preferences? Using both at the same seems pretty unusual. Thank you!

moby · April 5, 2023, 11:13pm

Hey @Toyoo

I don’t know when exactly Gerrit was introduced into the Storj developer workflow, but at one point we were definitely doing everything through Github PRs. I would estimate we are at a point now where most of the engineers at the company submit changes via Gerrit rather than Github PRs. My team of 6 people submit pretty much all of our changes over Gerrit. Some teams seem to prefer Github PRs, but that could be due to the particular repositories these teams primarily work with.

There is also a repository-based component. It is possible to submit Gerrit changesets for storj/storj, storj/common, and other public repositories. However, Gerrit is not set up for some of our private repositories, so changes to these require submitting Github PRs.

When Gerrit was introduced, I would say it wasn’t welcomed by the engineering team with completely open arms. In fact, I struggled with it for a while before got used to it, and it took me even longer to fall in love with it (now I would personally prefer not to work somewhere that did not enable me to use Gerrit).

Point being, there were a small number of people who were familiar with Gerrit and pushed for supporting it, and everyone else was totally new to it. Eventually more and more people began using Gerrit to a point where it achieved “critical mass” and sort of became the default.

At this point, I needed to start having the same conversation with every new hire on my team (or adjacent to my team) - the conversation where I mention how you can use Github or Gerrit to submit changes, depending on your preference, but I highly recommend trying both before making a decision. Because basically everyone has experience with Github PRs, and the Gerrit workflow feels intimidating at first if you don’t have an understanding of maintaining a clean git history.

Anyway, I subsequently wrote up an overview intended to introduce someone familiar with the Github-PR-workflow to Gerrit. I tried to write it in a way where the benefits and drawbacks are outlined, and the basic/minimal necessary git commands you need are easily accessible - Gerrit · storj/storj Wiki · GitHub

At this point, although it is possible to submit Github PRs, almost all of our commits to the storj/storj repository come from Gerrit. Regardless, I think it is still important for us to support merging via Github PRs. Submitting a PR is the default way an external contributor would typically open a change. Supporting Github PRs significantly lowers the barrier to entry for people outside the company who want to contribute. Contrast that to something like Bugzilla, a sort-of-bespoke piece of software that you need to learn how to use if you want to make a contribution to Firefox browser. Not throwing any shade on Mozilla or Firefox, I just think the barrier to entry if you’re like “hmm I’m interested in contributing to open source software” is a bit higher if you can’t make a Github PR. Likewise, sometimes a non-programmer will want to make a simple change (e.g. a product manager fixing a typo or datascience person adding a line to track some type of analytics activity), and is not interested in learning everything necessary to submit a Gerrit change, because it’s not like they’re regularly merging code; they’re just doing a one-off thing.

I want to finish off briefly explaining why I personally love Gerrit. I like it because the commit you submit for review is literally exactly the commit that gets merged. With a Github PR, you might have ten different commits that are squashed and merged into the commit that goes into the main branch, and even the commit message can be altered at the last minute right before merge. This can result in commit messages like

    DHT interfaces  (#28)

    * first pass at interface for dht
    * wip
    * comments addressed from @jtolds
    * adjust GetBuckets
    * missed comment
    * addressed PR comments from @jtolds

if you’re not paying attention, because when you squash and merge, Github will just make the commit body a bullet list of all the commits in the PR branch.

Contrast that with Gerrit, where the commit message is part of the code review. I love this. There are plenty of cases where I don’t have any comments about the code being changed, but I have lots to say about the commit message, e.g. “please link to the Github issue associated with this change” or “please provide more context so that someone reviewing this commit message does not need to look at the code to understand what the change does”. We also have some internal git commit guidelines that are easy to overlook when merging a Github PR.

Finally, I think Gerrit forces you into a git workflow that is (in my opinion) straight up better than the branch-based workflow that most people learn. It forces you to logically group your code changes into commits. Each commit serves a specific purpose. There is no room for a “wip” commit or a “fix typo” or “fix linter” commit. Just amend those changes to the original “fix bug” or “add feature” commit. It just clutters your git history (and your mind) to have a disorganized git history. Sure, it takes practice, but I think it’s worth it, and I would say that it seems like people at Storj tend to eventually prefer Gerrit even if they hate it at first (probably not true for everyone).

EDIT: Also, I know 0 things about how to set up/configure Gerrit (infrastructure-wise) for the first time. Someone else will have to respond about that process.

Toyoo · April 5, 2023, 11:51pm

I have to admit I didn’t expect that kind of extensive answer. Thank you a lot, it’s very useful!

I myself worked with Gerrit already in… I think, it was 2012, maybe 2013. It was a small startup, we were basically setting up our work environment from scratch. We evaluated several choices available at the time, and even then Gerrit was miles ahead of competition. Then I left for a job where I didn’t really interact with git at all, now I’m back and, uh, with the tools my team uses now—I miss Gerrit so much! Though, I’m in position where I can do some evangelisation. Which is why I’ve asked this question.

Frankly speaking, seeing that Storj Inc. uses Gerrit was to me one of the signs of leadership in the project caring about technical prowess.

jtolio · April 6, 2023, 12:46am

Great answer @moby!! I can speak to the only part @moby left out, which is the nitty gritty details of how our setup works, which might help you with your evangelism at your place @Toyoo.

As you probably know, both Gerrit and GitHub want to be the source of truth for your Git repo. Gerrit isn’t a system that typically wraps another Git repo elsewhere, and neither is GitHub. So, we do basically the only thing we can do, which is bidirectional mirroring. In theory, it could be the case that two people might merge a commit at the exact same time to the same repo through the different tools and that causes problems, but in practice I don’t think that has ever happened.

So, we use GitHub - jtolio/git-repo-syncer: a thing for bidirectional git repo syncing, which is a small Go tool I wrote that runs on a small cloud server somewhere. It has a collection of HTTP endpoints that are set up as webhooks. GitHub and Gerrit are both configured to make a GET request to this little service whenever a commit happens or a tag is pushed or whatever. This service then essentially runs git fetch from all the configured remotes and pushes up the latest references for all known branches and tags to all remotes.

Occasionally it breaks, and so while it’s not well documented, you can also configure it to post errors to a Slack webhook. So, we have a channel in Slack called “#git-sync-notices” that is mostly quiet but every so often has a stack trace for when someone has added a new repo incorrectly or whenever GitHub has a temporary outage. In theory, this is where we’d find out we need to manually intervene if two refs are simultaneously and conflictingly updated. Most of our team I don’t think really knows that this is even happening, it is so seamless in practice.

So anyway, if you’re considering setting up an optional Gerrit like we have, that’s what I would recommend is bidirectional git syncing of some kind (with webhooks for event triggering), and perhaps the above tool may be of use to you.

Toyoo · April 6, 2023, 10:14am

Thank you, this tool looks very useful, and simpler than I expected. It will make my next discussion with our ops team much easier!