Project Greenfield: Implementing Source Version Control
As promised, I have begun down the path outlined recently in Project Greenfield. As I discussed in that post, one thing my company has sorely lacked has been Version Control. Yes, there are backups. Yes, there are development copies. Yes, we have source escrowed with a third party. No, I don’t think any of those things count as source or version control.
I’ve discussed this topic many times with fellow geeks, and the conclusion is always the same: even as a 1 person team, I should absolutely be operating under source version control. I’ll admit for a while I thought it seemed like overkill for what I do, but over the years I have come to understand that it really is a fundamental part of the development environment. So, with Project Greenfield, I have finally implemented a Version Control System (VCS).
Choosing a Solution
Starting with a blank slate is nice: I am free to select whatever system I wish to use. And in the beginning, I will be the only one using it, so I have the opportunity to set the standard and get my feet wet before I need to bring anyone else into the fold. The problem was I had no idea what I was looking for or what I needed.
Naturally, I spent a bunch of time researching, and my friends will tell you I spent a lot of time asking pretty basic questions. I realize now that the solution isn’t really all that important. The most important thing is to use VCS: any VCS is better than no VCS! You can always change which system later by starting fresh in a new system, at least that’s how I see it. In fact, it appears that some people use multiple systems. I know one person who uses one system locally for his development work, but his company uses an entirely different system, so he updates his changes to that when he is done locally.
If you are new to VCS
If you are an old hat at VCS, you can safely skip this section. Or you can keep reading it if you want to laugh at me, I really don’t mind.
For the rest of you (us), I want to share a little of my research. My one cursory experience with VCS was in the late 90’s at an AS/400 shop. The system was built around a check-out, check-in, approval model. It made sense to me because it was very linear. There were several layers of approval required to get code back into the code base: supervisor, testing and quality assurance, documentation, and final review (or something like that – it’s been a while.) At each step of the way you had to deal with conflicts, rejections, etc. It was a lengthy and tedious process.
Expecting the same sort of experience, I was surprised to find that the world of VCS is not so straightforward. I learned that the choices partitioned themselves into two camps: Traditional Version Control Systems and Distributed Version Control Systems (DVCS). Frankly, I don’t feel qualified to discuss the differences between the two approaches, but I’ll try to hit the highlights.
VCS uses a central repository that contains all the code. Developers check out the code they need to work on and then check it back in when they are done. Because this is all done over the wire, it can be a little slow and cumbersome.
DVCS, on the other hand, distributes complete copies of the repository, so every developer machine becomes a full fledged version control system in its own right. All developer changes are then made to the local repository. This leaves the developer free to create new branches, experiment, refactor code, or what have you, without even pulling in code from the central repository. The repository can easily be reset to any point in its history at any time in the future, so you can abandon changes if they don’t work out. This is very powerful and is frequently called “time travel.”
When the developer is ready to post changes, he first pulls down the current version of the repository and merges his changes with it locally. This means all the conflict resolution is also handled locally by the developer who caused the conflict. Once all is right with the code again, it gets pushed back to the central repository where other developers can now go through the same process.
One nice thing about this approach is that there are no locks on the repository and no expectation that code must be “checked back in.” Another nice thing about DVCS is that if something happens to the central repository, it can be rebuilt from the developer copies. Finally, DVCS *really* only moves around the changes to the repository, not the entire repository. This means that updates are much smaller: combined with the fact that almost all the work is done locally and you have a lightening fast system.
From the reading I did and the polls I took, DVCS was the hands down winner, although one traditional VCS had a good showing.
This is totally a guess on my part, but it seems to me that the most prevalent system out there is Subversion, more commonly known as SVN. SVN is a very popular open source VCS. It’s free and supposedly easy to setup and use. It is a traditional VCS, so it has a Server component and a Client component.
There are some downsides of SVN, and traditional systems in general. Committing changes to the server is slower because of the complete files are being transferred and analyzed. Also, merging is more of a hassle because you have to download the complete files from the central server. The comparison and merging methods are different from DVCS, so conflicts are far more common. Additionally, SVN treats file and folder renames as deletes and adds, meaning you can lose revision history data.
I spoke with a lot of developers who use SVN, either as a personal choice or more often because their company uses it. All of the problems notwithstanding, the overall opinion of SVN was very positive. It appears to work well, supports large number of developers, has lots of tooling available, and is generally regarded as very stable. The same could not be said of the alternative VCSs out there.
VSS and TFS
Microsoft’s classic entry in this space is Visual Source Safe (VSS). VSS is famous as the source control developers love to hate. When I was at PDC09 I picked up a pretty cool shirt from a vendor: it has a picture of a woman screaming in surrealistic agony, and at the bottom are the words “VSS Must Die.” Naturally, the shirt is from a source control vendor, but it seems to sum up the community opinion of VSS. I can’t say I’ve ever heard a single positive remark about VSS, except that people positively hate it.
Fortunately, it seems that Microsoft agrees, and is attempting to replace VSS with Team Foundation Server. To be fair, TFS is much more than just version control, it is a complete code management system, with bug tracking, administrative control, rule enforcement, Visual Studio integration, and so on. I’ve heard questionable things about the source control but great things about the rest of the system. One suggestion I’ve heard was that Microsoft should allow any source control system to integrate with TFS, and that would make TFS ideal.
Git is a DVCS. I see a lot of talks about Git at code camps and conferences and it seems to be getting a lot of
attention in the .NET community. By virtue of the fact that it was written (at least partly) by Linus Torvalds, it has already become the de rigueur choice of Linux and open source geeks. Many Git users are almost fanatical about their devotion to this tool, which I think says a lot (some good and some bad.)
The good thing about Git is that it just seems to work, and work well. It is built for speed and from all accounts it delivers. As a distributed system it has all the benefits I mentioned above and then some. Finally, I learned about a most compelling feature for me: Github. Github is a web based hosting service for Git repositories, which made immediate sense to me in a distributed environment. I almost chose Git then and there because everything I heard about Github was fantastic: I think people are more fanatical about Github than Git itself. Of course, once I calmed down a bit I learned that other systems have similar hosting services available, so I did not allow that alone to be the deciding factor.
The bad thing about Git is that it really seems oriented towards gear heads. I don’t mean that as a derogatory term at all. To me, a gear head is someone who is comfortable operating closer to the metal, using things like shell scripts, command lines, configuration files, etc. I have nothing but respect for that, because while I can function at that level I really prefer not to. Instead, I want to see those complexities wrapped up in a nice, user-friendly interface that I can rely on to flawlessly enter the twelve switches of some cryptic command (but that’s just me).
The product I finally selected is Mercurial, commonly abbreviated to Hg for mercury’s abbreviation on the Periodic Table Of Elements (#80): the terms Mercurial and Hg are used interchangeably. Hg is similar to Git: it is a distributed system with all that entails. In fact, the two projects have some interesting parallels. They were inspired by the same event (the withdrawal of the free version of Bitkeeper), they were begun at virtually the same time, and they share many of the same goals.
They were also both originally designed to run on Linux, but it seems that Hg adapted to Windows faster and Git has been playing catch up in the cross platform arena. I don’t see that as much of a concern today since both systems functional perfectly well in a Windows environment. That being said, I consider the fact that CodePlex uses Hg as a pretty solid endorsement.
For me and my purposes, the best thing about Hg is that it feels less complex and seems more Windows friendly. This is really because the supporting Windows software, which I’ll cover shortly, is more advanced. The overall impression I got was that if I “just want to do source control”, then I can get up and running faster and easier with Mercurial, without the need to learn a ton of command line stuff. Since I have not implemented Git I cannot compare, but I was able to get Hg up and running pretty easily.
Hg is built on Python, so you will need to at least install the Python Windows Binary before you can install Hg. Python is free and open source, so just download it from the Python homepage. I chose the 2.6.5 Windows Installer (binary only) because I don’t want or need the source, but feel free to dig as deeply as you like. Also, as of this writing there is a newer version of Python, but the download page states “If you don’t know which version to use, start with Python 2.6.5; more existing third party software is compatible with Python 2 than Python 3 right now.”
Remembering that DVCS means each install is a full repository, there is no Hg Client vs. Hg Server installation. Instead, you simply install Hg. If you plan on just using the Command Line interface, you can simply download and install the latest version.
If you plan on using the Windows Integration features, which I would recommend, then skip this step and proceed to the next section on TortoiseHg.
TortoiseHg is a Windows Shell Extension that makes working with Hg in Windows a breeze. Once installed, you can access the source control tools directly from Windows Explorer by right-clicking on folders: the tools will be integrated into the context menus.
The reason we skipped the step above is that installing TortoiseHg will also install the latest version of Mercurial, so for a Windows developer this is where I would start.
If you do not use Visual Studio, you now have all you need to easily and quickly get started with Hg. If you do use Visual Studio, there is one other tool you will want to install: VisualHg.
VisualHg integrates most of the TortoiseHg features into Visual Studio, so you can manage your repository from directly within the IDE. Additionally, it adds icons to your Solution Explorer letting you know when files and projects in your Solution need to be committed to your local repository. It’s built on and tightly integrated with TortoiseHg, so that is a prerequisite.
Hg’s answer to Github is Bitbucket, which doesn’t have the reputation that Github has but seems to have the same basic toolset and abilities at the same price. For several reasons, I was very keen to host my source elsewhere, so I went ahead and created a free account to experiment. Using the service has been really easy, and linking my local repository to the private repository I created on Bitbucket is very simple: since it just uses HTTP, all I have to do is provide Hg with the link to the repository on Bitbucket.
Now how the heck do I use this thing?
As a stone cold newbie, I needed some guidance. A site I found very helpful, both to deciding to use Hg and in learning how it works, is Joel Spolsky’s excellent HgInit.com. This is probably the best non-video training I’ve seen on DVCS. It is command line oriented, but I suggest you go through it (probably more than once) to help understand what the GUI tools are doing for you. I know I will be returning to this site again in the future.
Also, I don’t often plug services you have to pay for, but TekPub.com is worth every penny. The videos are fantastic and widely varied. In this case, they have a series of videos called “Mastering Mercurial” by Rob Conery, a very well known figure in .NET land. This series uses TortoiseHg and Visual Hg and is a superb walk-through of Mercurial in a real world environment. If you have TekPub, go watch this series. If you don’t have TekPub, buy it, then go watch this series!
After that, the best thing I can recommend is to simply try it out. A buddy of mine and I have used Bitbucket and played around with making simultaneous changes to files, merging, multiple heads, etc. I think like a lot of things it will just take practice. In my case, as a lone developer, it is very simple: I make changes, I commit those changes to my local repository, and I update (or Push) t
hose changes to the central repository on Bitbucket.
Some Closing Thoughts
I have a tendency to suffer from “paralysis by analysis”, so this process took me far longer than it probably should have. Once I finally decided to do something about it, though, actually getting up and running was a pretty short exercise. I’d say it took me roughly half a day to get everything installed, figure out how to use Bitbucket, watch some videos, and learn how to use TortoiseHg and Visual Hg.
I want to make it clear that I am not advocating any particular solution. It does seem obvious to me that DVCS is the way of the future, which at this point means choosing between Git or Mercurial. Right after I selected Mercurial and got it up and running, I came across this article that has me wary of my choice. I’m not going to switch or anything like that, but I will proceed with a watchful eye. And I will continue to study Git and DVCS in general.
I have plenty left to learn: branching, multiple heads, sub-repositories, merging, and more. For now, I am just happy to be using source control: progress has been made!