The following content was taken from Atlassian.com
In this the first of a three part blog series which focuses on migrating the Jira code base from Subversion to Git. We wanted to share Atlassian’s migrating experience to those of you who are contemplating moving a large project to Git – without sacrificing active development. In our first post we discuss why we decided to make the switch to Git. In our second post we dive in the technical details of switching from Subversion to Git. In our third, and final post we will discuss how we managed the “human” angle to migrating.
Atlassian has been extremely excited about DVCS for a number of years and has invested heavily in DVCS. Atlassian has acquired Bitbucket – a cloud DVCS repository host, developed Stash – a behind the firewall Git repository manager and added DVCS support to FishEye, Atlassian’s code browsing and search tool. They have also added a myriad of DVCS connectors to Jira.
Along with Atlassian we believe DVCS is a great leap forward in software development. As part of this, Atlassian migrated the codebases for their own products and libraries from centralized version control systems (generally SVN) to DVCS. Some of these have been big migrations!
In this three part blog series we will focus on the biggest migration Atlassian has done – migrating the 11-year-old Jira codebase from SVN to Git. What obstacles did they encounter? What lessons did they learn? And most importantly, how did they do it without sacrificing active development on Jira? We hope that sharing this experience helps anyone approaching a similar migration.
We’ll focus on Git, because Jira moved to Git, but everything in this series applies equally to Mercurial. At Atlassian, they use both.
Migrating a big code base is not without cost. The first thing you will need to answer – both for yourself, your bosses, and the people who work for you – is what will DVCS bring us, and why is it worth the cost of migrating?
We have used SVN successfully on many projects. So has Atlassian. And I am sure many people reading this article have also used SVN successfully. Since there is always a cost to migration, you may be inclined to ask, “If Subversion has met my version control needs for many years, why should I change?” To me, that is the wrong question. The real question is, “How can DVCS make what we do today even better?”
Git is known for several things. For a developer working with code, it’s faster. It allows for advanced workflows like feature branching, forks and pull requests – in theory, these workflows are all possible with SVN, however the difficulty of merging in SVN compared to Git makes them untenable. But for anyone moving from SVN, the main benefit of Git is that because of its lightweight branching and easy merging, Git allows you to do your default SVN workflow better than SVN.
What do we mean by this? Let’s talk about how we actually develop and release software. Most of us work in a world where we have at least one released version of our software in the wild, which we call a “stable” branch. We maintain and contribute bug fixes to a stable branch while developing new features on a “development” branch (which is called trunk/master/default depending on which VCS you use).
When we commit bug fixes to stable, we need to get them into master too. SVN merge is known to be a pain and works solely on revision history – not actual content. As a result, a lot of people avoid it, or they do it infrequently and not as part of their day-to-day workflow. How many projects have you worked on where stable and development branches have started to diverge, or diverged so significantly that the effort to bring them back together is a real project cost? Many have certainly been in projects where this has happened, and when we speak to other developers it’s a frequent occurrence with SVN. There are some strategies to deal with it. For example, with Atlassian’s issues and tracking software, Jira, they ignored merging and required developers to make each commit individually to each stable and development branch, relying on QA to make sure that it happened correctly.
Git allows you to remove this pain. Git makes merging so easy that merging the entire stable branch into the development branch on each commit is a reality; it’s now Atlassian’s default workflow. So even if you don’t want to use feature branches or forks or pull requests immediately, Git provides advantages from day one.
And when Atlassian was ready, they were in a position to take advantage of the advanced workflows that Git allows. Before the switch to DVCS, Atlassian’s major products targeted 90-day release cycles. These 90-day releases went to two platforms: downloadable products for clients to install on their own servers; and a release to Atlassian’s hosted cloud platform (Atlassian OnDemand) for which clients pay a monthly fee. Using branches as a core part of development workflow has allowed Atlassian to shorten this to the point where they now release major products to the cloud every 2 weeks.
Jira is a decent size code base to move – 11 year’s worth of history, 47,228 commits across approximately 21,000 files. Atlassian averages about 30 different committers over a two-week period. More than that, the VCS is a real work-horse for a project like Jira. Builds, code reviews, scripts for releasing both product distributions and source… all these things have a rich tapestry of dependencies on the source code management system.
Their main goal in the migration was to minimize interruption to developers. This is about more than just the ability to commit code; it is about the infrastructure surrounding software development.
Atlassian has 3.5 years of history in Jira’s code review system.
Jira has a lot of CI. Atlassian runs approximately 60 build plans over different configurations and branches.
They have some other dependencies too – Jira has a somewhat complex release process that involves pulling together code from multiple sources. Atlassian also releases their source code to customers, which involves a different set of build scripts.
There is a tradeoff here between how fast you can migrate and how stably you can do it – Atlassian’s guiding principle was to optimize for stability over speed. If you set a deadline for your migration and it slips, what’s the worst that happens? Developers have to commit code to SVN for another week or so. Not the end of the world. It’s far worse if the migration interrupts developers’ ability to work and meet their own deadlines.
In the end, the migration took 14 days in total, with only a total of two hours where developers were unable to commit code. Atlassian were nearing the end of the development cycle for their latest release, Jira 5, and at no point were they unable to cut a release candidate.
When preparing the migration, there are a couple of things to be aware of.
First, it will take time. The actual git-svn clone, which takes all of the commits in the SVN repository and replicates them in Git, took three days for Atlassian.
Second, you should prepare and think of all the dependencies your infrastructure has on your VCS. And know that if your infrastructure is sufficiently complex (like Atlassian’s), there will be things you never dreamed of and only discover when they break. So don’t beat yourself up when you encounter a dragon. Just slay it, and continue on your quest.
A migration like this is not something you can do overnight, or even over a weekend. It needs to be managed for a sustained period of time.
Stably migrating is daunting but it is not brain surgery; there is a process Atlassian has employed to make it manageable. In part 2 of Atlassian’s Switch to Git series we walk through, step-by-step, the technical details on migrating from Subversion to Git.