Sunday, July 15, 2007

From CVS to Subversion

My friend George, whose day job requires the artistic and scientific skills of a system administrator (which pretty much boils down to: keep the users happy and yourself invisible), was envious of a guy who quit his job to write a web application framework in Lisp.

Common Lisp.

Of all languages.

Anyway, George is a great guy, at least when he is not having these wacky ideas, and an excellent system administrator. So the fact that he expressed envy for a programming gig, struck a chord in me. George loves challenges, too.

As a teenager I was a huge fan of MacGyver, a TV series where the main character is constantly facing intractable problems and always finds a solution through improvisation. In the same spirit, I have always liked doing challenging things, even when they appeared useless to some, rather than take the easy tasks with the most fanfare. As a software developer I seek challenges mainly through diversity, constantly seeking to conquer another problem space, another language, another framework in the vast field of computer science. But I've learned by now that nothing can be more challenging and satisfying at the same time for me, than a system administration task.

So I got myself one. I've signed up for the conversion of our source code repository from CVS to Subversion, as a break from my coding assignments. After eight productive years of using CVS as our main version control system, we became Subversion converts a couple of weeks ago. It hasn't been a wholesale migration yet, mainly for risk management reasons, but I've been committing exclusively to the new repository all last week. We began hosting new codebases (like my VCP project) in our shiny new repository, but for the time being we are still keeping the old ones in CVS. The process was rather painless. Not that it was a surprise for me. I have been making migration simulations and testing for about a year now and they always worked fine.

We had three main reasons for migrating to Subversion:

  1. Versioning renaming and copying files and directories. Since we are working mostly in Java where code refactoring often leads to renaming (for a class name change) and moving files around (for a package name change), it is imperative that we keep the change history of a file after renaming it. Till now the history was just lost after a rename, something that tended to make refactoring happen only when absolutely necessary.
  2. Atomic commit sets. Having the ability to commit every changed file in a single batch with the same revision number, made rolling back changes easier. It also removed the need to keep a separate log of our CVS commits in order to find out what files were modified along with a specific one in the same commit.
  3. Cheap branching and tagging. Experimental branches for fleshing out risky ideas should become more common now, since Subversion has better support for them. Also, no more tag-sliding to cope with unexpected last-minute fixes before a release, although, to be fair, Eclipse already helped with that.
However we are grateful for other benefits as well, perhaps less critical, but ones that provide an air of modernity to our deployment:
  1. The repository is now served via HTTP, allowing far more versatile deployments, than cvs over ssh. SSL tunnels allow for remote access to the repository without a VPN connection, something that was out of the question in the past, since it would mean that shell access to our development server would be open to the world. Also, the ability to use LDAP for authentication and authorization, removed the need for shell accounts for every committer. That had been partly achieved in the past through PAM and the pam_ldap module, but setting it up involved screaming, crying, weeping, swearing and hair pulling. Not looking forward to it, ever again.
  2. Binary diffs make storing binary files in the repository cheaper. If you store large binary files in the repo, like third-party libraries, CVS adds the whole new version, not just the changes. Not that we worry too much about the repository space, but it definitely feels like we are in the 21st century.
  3. Versioning symbolic links. Not that I expect to use them a lot, but still a nice touch.
  4. We got hook scripts from the vendor, for tasks like e-mail messages on commit, log message cleanup, etc. We had all that with CVS, but I had blatantly copied most of them from the FreeBSD project's excellent repository and while they were of the highest quality, they came without much documentation. Since I already had a working setup it did not seem like a big deal, but if I had to do it all over again copying scripts from, say Apache repositories, I think I would have posted this blog on September.
  5. We also got repository mirroring and backup without file system specific tricks. Backing up the repository was a straight file copy for CVS, but since Subversion is more sophisticated (even for the fsfs backend that we use), having to resort to file system snapshots for a proper backup would have been a pain. Our development server is a FreeBSD system and UFS2 snapshots are relatively cheap, but not as simple as hot-backup.py. Also, repository mirroring only came up once and it would have required a nullfs mount from another jail in the server, but having the option of svnsync makes things simpler.
  6. Fine-grained authorization for repository access with CVS over SSH, required enabling ACL support in UFS2 and manually handling the permissions in each folder or file. Subversion over HTTP however moves the authorization configuration on a separate place, as it should be.

Subversion definitely seems like CVS done right. There is a clear scent of modernity in every corner I look, compared to CVS. I was surprised to find various hook scripts and assorted infrastructure programs written in Python, Perl and Ruby. These are pretty much all the mainstream languages in a UNIX system these days, which says something about something.

The only thing I'm not terribly excited about, is the Subversion support in Eclipse. I've been using Subclipse so far and although I haven't had any real problems, the integration is not as good as with CVS. Which brings me to the main reason I picked Subversion for our next-gen version control system: Eclipse support. Had it not been for mature plugins like Subclipse and Subversive, all the advantages over CVS would have meant squat. Having to resort to command-line tools in order to perform diff, update and commit would have sent our team's productivity to the drain, before you can say Alt-Tab.

If I valued technical excellence over productivity, I would have moved us to Bazaar, Mercurial or git instead of Subversion. Every cool project these days seems to pick one of those for a VCS. They even have alpha-quality Eclipse plugins. Then I would need to schedule a weekend for the whole team in a Mediterranean resort and clue them in the joys of distributed version control systems.

Hmm, not a bad idea now that I think of it.

Then again, had I chosen technical excellence over getting work done, I would probably quit my job and build a web application framework in Lisp. Common Lisp.

Damn. George was right, again.

I hate it when he is right.

Creative Commons License Unless otherwise expressly stated, all original material in this weblog is licensed under a Creative Commons Attribution 3.0 License.