In the last few months working on the source code for Our Machinery, I’ve found myself drifting further and further away from the GitFlow based workflow we used at Stingray towards a more trunk-based model.
For me, this is often how things go. The problems of one approach push me in the other direction to look for a solution. A long time ago, the spaghetti of unstructured code lead me to OOP. Then the performance problems and inheritance mess of OOP made me go for a simpler more C like data-oriented approach. Sometimes perhaps the pendulum swings a bit to far in the other direction, but I think that is necessary. It is hard to get a good understanding of the advantages and problems of different approaches if you always stay in the middle of the road.
When it comes to source control, I’ve used a lot of different tools and methodologies over the years: CVS, Subversion, Darcs, Perforce, to mention a few. At Bitsquid we were using Mercurial, which is very similar to Git, except the low level plumbing is a bit more hidden and the command line is saner (I still have to google the Git documentation once a week). We mostly pushed our changes directly into the trunk and then ran our CI tests after the push to alert us of any potential problems.
Our philosophy was to get the latest build out to as many people as possible as quickly as possible. Getting the build out quickly means any problems are discovered quickly, which means they get fixed quickly, which should give us a better quality product.
Most of our customers pulled the source code directly from our trunk and merged it with their own changes. Sometimes they would do this often (daily), then towards the end of a project, more infrequently as things stabilized. We didn’t try to create “blessed” versions or “releases” and didn’t really have any control over what version our customers pulled — they just pulled whenever it was convenient for them.
This approach had some great advantages and some great disadvantages. Advantage: Our turnaround time for getting code modifications out to our customers was very short — less than an hour in many cases. So if they ran into a critical bug or performance problem, we could sort it out quickly. When I worked on the games side and not the engine side of this industry, I worked with middleware where getting a fix for a fatal and obvious bug with a clear repro case could take six months or more. Obviously that doesn’t work — in six months the game might already be released — so I just had to find a hack or workaround that we could live with. Having a short turnaround time is really important.
Another advantage: We could use our customers as QA. If we ever pushed a change that created a problem for our customers they would be quick to tell us about it. Granted, this was more of an advantage for us than for our customers, but the nice thing about it is that the customers will QA exactly the workflows that are important to them, something that is really hard to do with internal QA.
The big disadvantage of course is that ever so often we would push a bug and break our customers’ tools, games, etc. In bad cases, a big part of their team might not be able to work because of a showstopper bug introduced by us.
This is of course pretty terrible. Our main argument for why it wasn’t terrible terrible is that in case of a catastrophic problem after a pull, the customers could always revert to the last version they used. In theory, after a simple revert and recompile, they should be back up and running as before while we worked on fixing whatever problem they ran into.
In practice though, I think this revert protocol was rarely executed — instead the customers mostly waited for a fix from us (which was typically less than an hour away). I’m not exactly sure why. I think part of it might be psychology — it feels better to move forward than to go back. Part of it might be that it may take some time before a bug is discovered and by then everyone is already using the the new version with features they have been waiting for and don’t want to go back.
All together, it ended up being kind of a trade-off deal with the customers — they had to do some QA work for us and deal with the occasional crashes. In return, they got really fast turnaround times for bug fixes and critical features. It seemed to be a deal that most of the customers were pretty happy with. It worked well because we had a small, closely knit community — a small customer base and dedicated developers. But of course, it was never fun for anybody when they ran into showstopper bugs introduced by us.
Fast forward until 2014 when Bitsquid was acquired by Autodesk and became Stingray. We quickly ran into problem with this development methodology:
As part of the acquisition, a lot of new developers joined the team. These developers were not familiar with the “style” and “spirit” of the codebase, but still wanted to (and were pushed by their managers to) make contributions. They were also not co-located with the original Bitsquid team in Stockholm, but in several cases many time zones away, which means it took longer for them to absorb the culture of the codebase. The result was a lot more commits that introduced bugs or broke with the core ideas/philosophies of the engine.
Being a part of a larger organization, such as Autodesk, meant there were suddenly multiple agendas. People had their own pet projects; libraries they wanted integrated, peripherals they wanted supported, etc. With unbridled development in multiple teams we saw a big risk for featuritis.
Stingray had a much larger target audience than Bitsquid originally had. Bitsquid was a game engine targeted at mid to high-end studios with their own engineers, comfortable with debugging and building from source. Stingray wanted to capture the casual game development market too, as well as the broader field of real-time visualization in all of Autodesk’s product domains. That means that the old tradeoff — you do QA for us, it may be broken sometimes, but we’ll fix it quickly — was no longer valid. With a larger base of more inexperienced customers we would need something closer to traditional releases, with QA and bug fixing.
It was to address these issues that we introduced the GitFlow model. (We did other things too, such as add a plugin system, so that pet projects could be developed without modifying the main engine.) The idea was that all development would happen in feature branches and then be merged into the trunk through pull requests. Before merge, the PR would have to pass our CI tests and be approved in a code review.
I think the mindset we had at the time when these measures were introduced was to “defend” the code from unwanted changes — bugs, performance degradation, unneeded features, concepts that didn’t match our core philosophies, etc. Needless to say, this created a lot of strife between the development teams. One side seeing an uncontrolled mob out to destroy their beautiful code base. The other side seeing a bunch of ivory tower elitists, preventing them from getting any real work done.
In hindsight, it is easy to see that the idea that you should have to defend your code from your own developers is fundamentally fucked up, and should have been solved in other ways. Instead of pull request policing we would have needed much more communication, discussions, mentoring and in-person meetings to make sure we were all on the same page with what should be in the engine and what the code should look like. Note that this alignment did eventually happen — but it took a year or so to establish that mutual trust.
Also note that I’m not saying that “communication” can magically solve all issues. Sometimes there are fundamental differences that can’t be resolved and only lead to endless unproductive discussions. In those cases, someone should probably leave the team. (In contrary, fundamental differences that lead to productive discussions is one of the most valuable things you can have on a team.)
Ok, back to git. Over the years of working with the GitFlow model I felt an increasing sense of frustration with it. The problems as I see them are:
Pull requests take a significant amount of time to get merged. The review of a pull request requires some back-and-forth messaging between people on different continents. This will take at least a day and if there are a lot of modifications needed, it might be several. Getting stable CI tests over a wide range of mobile and console hardware can be tricky too.
As a result, people tend to make bigger and bigger pull requests. If you have a problem that needs to be solved in multiple steps, you can’t sit idle for days while you’re waiting for the first step to be reviewed. But continuing on the second step sucks too, because if the first steps requires any modification, you will have to redo your work on the second step. So better to do all the steps in one pull request. Then at least, you can get some uninterrupted work done before you get stuck waiting for pull requests.
This creates a bad spiral. As pull requests get bigger they also get harder to review. A 7000 line pull request takes a lot of time to review properly. As a reviewer you tend to put it off as long as possible. Such a big PR also tends to have more potential problems, require more rounds of back-and-forth and need more reviewers (because it is touching more systems). PRs now take weeks to merge, so people make them even bigger to compensate.
Merge problems become hairier and hairier. With everybody working on a long-lived branch, there will be multiple conflicts once they eventually merge. Large scale refactors are discouraged because they will “break everybody’s branches”.
The best time for a review is early. By now we have a situation where people will work on a PR for weeks or months before submitting it for review. Sometimes, as that PR is reviewed, fundamental flaws are discovered that may require a complete rewrite. If the flaw had been discovered early, much work could have been saved. Also, rewriting something you have worked on for months is no fun. People understandably become defensive after that much has been put in and reluctant to fix the problem (a.k.a. the sunk-cost fallacy).
No shared view of the product. Since everyone is working on a long feature branch of their own, there is no consensus view of the state of the product. Managers can’t look at it and get a feeling for what is going on. Features get no “free” QA testing from other developers, because only the developer working on that branch will see them. It is only when everything is integrated (in a scramble, just before the release) that the true status of the product is known.
Product history is a mess. Trying to follow the code history through multiple merged branches is close to impossible.
Code review happens through a small window. When reviewing a PR you only look at the fraction of the code that just changed. Thus you tend to mostly notice small issues — formatting, variable naming, the occasional off-by-one error. It’s more important to review how systems fit together, if the granularity of the API is right, if the fundamental data structures are sound, etc. So system reviews are more valuable than reviews of individual PR.
PR commentary is lost. The discussion that happens in a PR about the choice of a particular solution, potential performance problems, etc, tends to be lost after the PR is gone. At least, nobody I know goes through old PRs digging for information. Some of this discussion can be very valuable and it would be good to capture it for future reference.
The replacement model we use in Our Machinery instead works as follows:
- Everybody works against the trunk. ****Everybody sees the same code, everybody knows the current status of the project. Problems will be found quickly.
- Work is done in small units and pushed to the trunk frequently. At least once a day.
- Big work is divided up into smaller, individual chunks that can be pushed separately.
- Big changes (e.g., switching to a new sound system) are made through feature flags. I.e., the new system is developed in parallel with the old one with a flag saying which system should be used. This allows the new system to be tested before it’s feature complete.
- Branches can be used locally, as a convenient undo/redo mechanism for the entire project, but they only live until the work is pushed.
- When we push, each push is merged into a single commit that is rebased on the trunk head. Thus, the project history is linear and easy to search.
- The commit message is very descriptive. Typically, everything that you would normally write to describe your changes if you were making a PR instead goes into the commit message, so that it is properly saved in the project history.
- No code review needs to be passed before pushing to trunk. It is the responsibility of the developer to write good code.
- Pushed commits are put on a stream so they can be reviewed (after push) by anyone who wants to. We actually use GitHub’s pull request system for this, since it is a convenient code commenting system, but any messaging/forum system could be used.
- Problems discovered during such reviews are fixed with a later commit. In fact, the reviewer can fix it immediately if the fix is trivial. Serious problems might require a revert.
- When a system is done, a complete code review of it is done to verify the implementation and spread knowledge. It is probably a good idea to schedule additional system reviews as a regular part of maintenance too.
The whole idea is to get back to a more agile workflow, where problems are discovered and fixed quickly; where developers are trusted and responsible, and all working towards the same goal; and where the product quality is consistent and reliable and not ebbing and flowing in release/fix cycles.
So far, it seems to be going well and I like it a lot, but we are still a small shop. It will be interesting to see how it works with more developers with a varying range of experience.