Project Greenfield

Project Greenfield: Learning SSIS

September 17, 2010 joelcochran Comments off

While Expression Blend remains my favorite tool and Visual Studio my most productive tool (with a little R# love), Sql Server Integration Services (SSIS) has moved solidly into the #3 slot.

I frequently have tasks that require me to move data from one location to another, perform conversions and transformations, create new tables with the resulting columns, etc. In the past, this meant a lot of ADO.NET and a lot of coding. These processes can be exceedingly slow and time consuming to produce and the performance is frequently less than desirable. With what I’ve learned about SSIS, most of that work will be a thing of the past. I can now do conversion work in a fraction of the time it took before and the resulting product, a Package in SSIS lingo, can execute in far less time.

It seems that SSIS is thought of as a DBA’s tool, but I believe that as Blend is not just for Designers, SSIS is not just for DBAs. This post is not going to be a how to or any kind of definitive work: what I want to accomplish is to introduce my fellow developers to the glory of SSIS and highlight some of the reasons I think you should be learning this technology.

Project Greenfield and SSIS

For Project Greenfield, one of the primary tasks is to convert data from the legacy IBM *insert nom du jour here* midrange server database to the new Sql Server database.

This is far more than pushing data from once place to another: the structure is completely different. Relationships are defined now that previously were unenforced and large tables are broken into dozens of smaller, more normalized tables, often in different schemas. Fields that were previously fixed length and Numeric types are now varchars and ints. In some cases single fields have been broken into multiple fields, and in some cases multiple fields have been combined. In all cases, data coming out is Unicode but is being stored as ANSI.

Obviously, this conversion represents a significant body of work in its own right. One of my recent tasks was to provide enough of a conversion that I could start prototyping (fake data just wasn’t what we wanted.) The amount of work I was able to do in a week would have easily taken over a month to write using ADO.NET. And best of all, now that I have a solid framework in place making changes is very easy.

Getting Started with SSIS

In order to start with SSIS, you will have to have it installed. More accurately, you will need the SQL Server Business Intelligence Development Studio installed, also known as BIDS. This is found as an option when installing SQL Server and I’m pretty sure it is not available below SQL Server Standard.

The current version of BIDS runs in Visual Studio 2008. If you already have VS2008 installed you will find a new Project Type category called Business Intelligence Projects added to your existing install. If you do not have VS2008 BIDS will install a Visual Studio 2008 Shell, even if you have VS2010 installed.

To start a new project, select the Business Intelligence Projects category and Integration Services Project in the create new project dialog. Once it is created, opening it and working with it is basically the same as any other solution.

Work Flow in SSIS

BIDS itself is the first application I’ve seen that serves as a compelling example of a Workflow driven application. The Package Designer workspace is organized in tabs, the only two of which I’ve needed so far are Control Flow and Data Flow.

All tasks are defined as compartmentalized units of work. The visual blocks for those are all shown in the Control Flow tab. These tasks may or may not be grouped into containers such as Sequence Container or Foreach Loop Container. You may define as many containers as necessary to organize the Package. So far I have preferred Sequence Containers as they allow me to organize tasks procedurally. Except for the simplest Package, I would not define tasks outside of containers.

There are many different task types available, but I have only needed three so far: Data Flow Task, Execute SQL Task, and Script Task. And now that I have better knowledge of what I am doing, I could get by without the Execute SQL Task.

Data Flow Task

At the heart of SSIS is the Data Flow Task. The basic formula is this: read data from a data source, manipulate/transform that data, then write the transformed data to the target destination. Data sources can be ADO.NET or OLE DB database connections but can also be Excel, Flat, or XML Files. There are even more options for Target Destinations.

In between the source and the target are the Data Flow Transformations which really represent the power of SSIS. Here is a brief list of the transformations I have so far found most useful.

Conditional Split – Evaluates the data in the current columns and creates logical subsets which can then be handled differently. Each subset effectively becomes it’s own data source at that point.

Derived Column – In my mind, the most important transformation of the bunch: derived columns are the new (or replacement) columns built by converting or transforming the source data. SSIS includes a highly evolved “Expression Language” that is used to convert the data at runtime. String manipulation, type conversion, mathematical operations, and much more are all supported.

Lookup – Second in importance only to Derived Column, this transformation allows you to perform lookup actions against other tables. This is essential for preventing invalid foreign key insertion. It also is great for performing Incremental Loads: basically, this means only inserting records into the database if they don’t already exist. This becomes important for several reasons, the least of which not being that I want to be able to execute the Package as often as possible, especially during development.

Multicast – Multicast creates multiple copies of the current data set, so you can perform multiple writes to multiple destinations.

The Script Task

The Script task allows you to write code to do things. Primarily I have used this to work with package variables, a whole topic in its own right, and for making OLE DB connections dynamic. I see substantial potential in the Script Task though as it really opens up the entire .NET Framework to the process.

Final Thoughts

Obviously, this barely scratches the surface of SSIS. BIDS is primarily a graphic tool, but there are distinct functions that developers could really leverage. The other place where developers could shine in SSIS is in process flow: once I understood things like Sequence Container and Conditional Split I really felt like I could make SSIS sing. This kind of flow is exactly what we code day in and day out, so I think developers can pick up SSIS quickly.

I may write some more about SSIS going forward, but if you are interested and looking for info now I recommend you check out Andy Leonard’s blog.

Categories: Database, Project Greenfield, SQL Server, SSIS

Project Greenfield: Testing the TDD Waters

May 24, 2010 joelcochran 2 comments

NOTE: if you are just here for the video, the link is here: http://www.developingfor.net/videos/TDD1Video/

I’ve mentioned recently in Developer Growth Spurts and Project Greenfield that I am trying my hand at Test Driven Development (TDD). I’ve been reading a lot about it and have given it a go on a couple of occasions. I’ve been sidelined the last week or so by a billable project with a deadline (I supposed paying the bills is kind of important), so I’m not focused on Project Greenfield right now, but I don’t see that as an excuse to completely halt my progress.

Taking Advantage of the Unexpected

The good news is that the side project fits very well into the overall goals of Project Greenfield. The project is a pretty straightforward conversion project, reading data from our legacy database and writing it to a SQL Server database. I even get to design the schema of the target database.

I suppose a SQL Server guru would use SSIS or something like that to accomplish this task with no code, but that is well beyond my SQL Server skills at the moment. The project does, however, give me the chance to experiment with a few other technologies that I will be using in Project Greenfield, so I am trying some new things out and only billing half-time to make up for it with my client.

SQL Server

This is my first real project using SQL Server, small though it may be. I’ve messed around with it in the past, creating some tables and relationships for a co-worker, but this is something that will actually be going into the field so it is different. The first thing I did was build the schema based on the client’s specifications. As I was doing so, I realized it was wrong, but I finished it anyway because I didn’t want to stop progress to wait on a response. Once I was able to communicate with them, they agreed with my concerns and now I am fixing the problems, which are largely normalization issues.

I will share though, that I think I screwed up. My first instinct was to use a SQL Server project in Visual Studio, largely so it would be under version control. Unfortunately, when using such a project failed to be intuitive, I quickly gave up and went with what I know. In Visual Studio I connected to my local SQL Server Express, created a Database Diagram, and used it to create my schema.

This works just fine, except I now have no way to get to that database to extract the schema for my client. I know the answer is supposed to be to use SQL Server Management Studio, which I have installed for SQL Server 2005, but I need one that works with SQL Server 2008 Express. I found it online and downloaded it, but it won’t install. I’ll have to spend some time soon fixing this or come up with another solution. I do have a couple of ideas that would involve using …

Entity Framework 4

The next thing I am doing differently is using Entity Framework 4 for all of the SQL Server database access. Don’t get me wrong, I’m not doing anything really complex: all I need to do is connect to the database and write new records in a handful of files. But it has given me the opportunity to understand how to work with the Entity Context, how to manage object creation, experiment with how often to write records, learn about object relationships, and more. I feel much more confident with EF now.

This was helped by spending some time this weekend at Richmond Code Camp with Dane Morgridge. I was able to sit in his EF4 presentation and we spent some time coding together later, so I learned a bunch more about EF in the process. But he gave me some great guidance, and as it gels more I’m sure I will write about it. We also talked about Dependency Injection and some other stuff: folks, THIS is why I love community events so much!

Test Driven Development

If you’ve managed to read this far you are surely asking yourself “I thought this was supposed to be about TDD?” Fair enough, I just wanted to lay some of the ground work for the project.

I started this project with the intent of implementing TDD. I felt that a small project like this would be ideal to get my feet wet, and I will say so far so good. I’m sure I’m not doing it “just right”, but I am doing it which is a huge step forward. A buddy of mine said this weekend that just trying TDD puts me far ahead of most .NET developers when it comes to testing. I’ll take that with a grain of salt, but in a way I’m sure he’s correct.

As usual, I really started with the best of intentions. I began with an empty solution and created two projects: the working project and the testing project. I began writing code in my Test class first, then allowed the magic of ReSharper to help me create the classes and methods I was testing. I also used the NUnit Code Snippets I wrote to speed production.

Mocking

I quickly ran into my first need for a mock object. I have a huge pre-existing DAL project that handles all of the legacy database work. The main class I would be using is about 3500 lines of codes, so naturally I wasn’t about to reinvent the wheel. I also thought at first that mocking this class up would be inordinately difficult, but I was willing to go down the rabbit hole for a little while to see where it led.

Where I ended up, at least at first, was actually not that bad. I used ReSharper once again to extract an interface from this huge class. At first, I thought I found a ReSharper bug: my entire computer froze for about 4 minutes. The mouse disappeared, the keyboard would not respond, windows would not focus, etc. I was basically locked out of my machine. I let it sit for a while and sure enough it came back and the new Interface file was created.

Now for my mock object: I created a test class in my test project that implemented the same interface. I did make one mistake: I allowed the implementation to throw Not Implemented exceptions. This caused some issues later, so I changed it to just create default auto properties.

Now for one of the beauties of TDD: because I was committed to writing the tests first and then writing just enough code to make it pass, I did NOT try to implement all the properties and methods of my test class. Instead, I implemented each one as I was testing it! This helped with the mocking a lot since there were plenty of properties and methods I was not using in this project.

And Not Mocking

This worked great for a while, but I admit that it eventually began breaking down. Or rather, I began breaking down.

I ran into what I considered a practicality issue. It may fly in the face of TDD, Unit Testing, and Code Coverage, but I got the feeling that there are just some things I don’t need to test. The legacy database DAL has been used extensively: I know it reads data from the database correctly, so I’m not goi
ng to try to fit in a bunch of tests after the fact. If I was starting from scratch perhaps I would, but at this point in the game there just isn’t enough ROI.

I came to the same conclusion with Entity Framework: I’m pretty sure that I don’t need to test that putting a string into a string variable in an EF class actually works. And for about 90% of this project, that’s all I’m doing: moving strings from the legacy database DAL to my new Entity Framework classes. So I decided that when that’s all I’m doing, moving one piece of data from old to new, with no reformatting, type conversions, or anything like that, then I was not going to write tests for those operations.

So the tests I did write for that first class were only for times when I had to convert or reformat the data. This was good because it severely limited the number of test scenario I needed to cover. I expect this is an issue I will have to figure out at some point: I know the goal is to test everything, but surely there must be a line drawn somewhere.

And then I ran into an issue where Mocking didn’t seem feasible. And before I go any further, I recognize that I am not talking about mocking frameworks or auto mocking or anything like that: I guess what I am really doing is called stubbing.

As I got further into the conversion, I began to rely on data from the legacy database. I could have faked all these classes, but it would have taken a lot of time and effort for very little reward. Fortunately, and one of the reasons it became difficult to mock all of this out, is that much of the data I needed at this point is static system data. Faking this stuff out would have just been a nightmare, so instead I chose to integrate a single database connection into my unit tests.

I realize this breaks a few rules. It means I have to be connected to my network at my office to run these particular tests. It means that my tests, and ultimately my code, is brittle because if this dependency. Which means that I should probably be using a mocking framework and Dependency Injection to solve some of these problems. Not to worry, I’ll get there!

I’m sure the TDD and testing purists would have a field day with my decision. And I’m cool with all of that, I welcome the comments.

Houston, we have Video!

During these adventures I thought it would be interesting if I shared some of the Project Greenfield content as videos. As a result, I am happy to announce the first ever Developing For .NET Video, available for viewing at http://www.developingfor.net/videos/TDD1Video/

Rather than walk through some Hello World/Calculator TDD example, this video contains, among other things, a walk through of a real world TDD sample. I have a method I need to create, so I write a Unit Test first, use it to create the Method, write enough code to compile but fail, then write enough code to pass, all in a real production project!

I would love to hear your comments about the video, so please add them to this post.

Categories: Project Greenfield

Project Greenfield: Implementing Source Version Control

May 20, 2010 joelcochran 3 comments

As promised, I have begun down the path outlined recently in Project Greenfield. As I discussed in that post, one thing my company has sorely lacked has been Version Control. Yes, there are backups. Yes, there are development copies. Yes, we have source escrowed with a third party. No, I don’t think any of those things count as source or version control.

I’ve discussed this topic many times with fellow geeks, and the conclusion is always the same: even as a 1 person team, I should absolutely be operating under source version control. I’ll admit for a while I thought it seemed like overkill for what I do, but over the years I have come to understand that it really is a fundamental part of the development environment. So, with Project Greenfield, I have finally implemented a Version Control System (VCS).

Choosing a Solution

Starting with a blank slate is nice: I am free to select whatever system I wish to use. And in the beginning, I will be the only one using it, so I have the opportunity to set the standard and get my feet wet before I need to bring anyone else into the fold. The problem was I had no idea what I was looking for or what I needed.

Naturally, I spent a bunch of time researching, and my friends will tell you I spent a lot of time asking pretty basic questions. I realize now that the solution isn’t really all that important. The most important thing is to use VCS: any VCS is better than no VCS! You can always change which system later by starting fresh in a new system, at least that’s how I see it. In fact, it appears that some people use multiple systems. I know one person who uses one system locally for his development work, but his company uses an entirely different system, so he updates his changes to that when he is done locally.

If you are new to VCS

If you are an old hat at VCS, you can safely skip this section. Or you can keep reading it if you want to laugh at me, I really don’t mind.

For the rest of you (us), I want to share a little of my research. My one cursory experience with VCS was in the late 90’s at an AS/400 shop. The system was built around a check-out, check-in, approval model. It made sense to me because it was very linear. There were several layers of approval required to get code back into the code base: supervisor, testing and quality assurance, documentation, and final review (or something like that – it’s been a while.) At each step of the way you had to deal with conflicts, rejections, etc. It was a lengthy and tedious process.

Expecting the same sort of experience, I was surprised to find that the world of VCS is not so straightforward. I learned that the choices partitioned themselves into two camps: Traditional Version Control Systems and Distributed Version Control Systems (DVCS). Frankly, I don’t feel qualified to discuss the differences between the two approaches, but I’ll try to hit the highlights.

VCS uses a central repository that contains all the code. Developers check out the code they need to work on and then check it back in when they are done. Because this is all done over the wire, it can be a little slow and cumbersome.

DVCS, on the other hand, distributes complete copies of the repository, so every developer machine becomes a full fledged version control system in its own right. All developer changes are then made to the local repository. This leaves the developer free to create new branches, experiment, refactor code, or what have you, without even pulling in code from the central repository. The repository can easily be reset to any point in its history at any time in the future, so you can abandon changes if they don’t work out. This is very powerful and is frequently called “time travel.”

When the developer is ready to post changes, he first pulls down the current version of the repository and merges his changes with it locally. This means all the conflict resolution is also handled locally by the developer who caused the conflict. Once all is right with the code again, it gets pushed back to the central repository where other developers can now go through the same process.

One nice thing about this approach is that there are no locks on the repository and no expectation that code must be “checked back in.” Another nice thing about DVCS is that if something happens to the central repository, it can be rebuilt from the developer copies. Finally, DVCS *really* only moves around the changes to the repository, not the entire repository. This means that updates are much smaller: combined with the fact that almost all the work is done locally and you have a lightening fast system.

From the reading I did and the polls I took, DVCS was the hands down winner, although one traditional VCS had a good showing.

The Choices

SVN

This is totally a guess on my part, but it seems to me that the most prevalent system out there is Subversion, more commonly known as SVN. SVN is a very popular open source VCS. It’s free and supposedly easy to setup and use. It is a traditional VCS, so it has a Server component and a Client component.

There are some downsides of SVN, and traditional systems in general. Committing changes to the server is slower because of the complete files are being transferred and analyzed. Also, merging is more of a hassle because you have to download the complete files from the central server. The comparison and merging methods are different from DVCS, so conflicts are far more common. Additionally, SVN treats file and folder renames as deletes and adds, meaning you can lose revision history data.

I spoke with a lot of developers who use SVN, either as a personal choice or more often because their company uses it. All of the problems notwithstanding, the overall opinion of SVN was very positive. It appears to work well, supports large number of developers, has lots of tooling available, and is generally regarded as very stable. The same could not be said of the alternative VCSs out there.

VSS and TFS

Microsoft’s classic entry in this space is Visual Source Safe (VSS). VSS is famous as the source control developers love to hate. When I was at PDC09 I picked up a pretty cool shirt from a vendor: it has a picture of a woman screaming in surrealistic agony, and at the bottom are the words “VSS Must Die.” Naturally, the shirt is from a source control vendor, but it seems to sum up the community opinion of VSS. I can’t say I’ve ever heard a single positive remark about VSS, except that people positively hate it.

Fortunately, it seems that Microsoft agrees, and is attempting to replace VSS with Team Foundation Server. To be fair, TFS is much more than just version control, it is a complete code management system, with bug tracking, administrative control, rule enforcement, Visual Studio integration, and so on. I’ve heard questionable things about the source control but great things about the rest of the system. One suggestion I’ve heard was that Microsoft should allow any source control system to integrate with TFS, and that would make TFS ideal.

Git

Git is a DVCS. I see a lot of talks about Git at code camps and conferences and it seems to be getting a lot of
attention in the .NET community. By virtue of the fact that it was written (at least partly) by Linus Torvalds, it has already become the de rigueur choice of Linux and open source geeks. Many Git users are almost fanatical about their devotion to this tool, which I think says a lot (some good and some bad.)

The good thing about Git is that it just seems to work, and work well. It is built for speed and from all accounts it delivers. As a distributed system it has all the benefits I mentioned above and then some. Finally, I learned about a most compelling feature for me: Github. Github is a web based hosting service for Git repositories, which made immediate sense to me in a distributed environment. I almost chose Git then and there because everything I heard about Github was fantastic: I think people are more fanatical about Github than Git itself. Of course, once I calmed down a bit I learned that other systems have similar hosting services available, so I did not allow that alone to be the deciding factor.

The bad thing about Git is that it really seems oriented towards gear heads. I don’t mean that as a derogatory term at all. To me, a gear head is someone who is comfortable operating closer to the metal, using things like shell scripts, command lines, configuration files, etc. I have nothing but respect for that, because while I can function at that level I really prefer not to. Instead, I want to see those complexities wrapped up in a nice, user-friendly interface that I can rely on to flawlessly enter the twelve switches of some cryptic command (but that’s just me).

Mercurial (Hg)

The product I finally selected is Mercurial, commonly abbreviated to Hg for mercury’s abbreviation on the Periodic Table Of Elements (#80): the terms Mercurial and Hg are used interchangeably. Hg is similar to Git: it is a distributed system with all that entails. In fact, the two projects have some interesting parallels. They were inspired by the same event (the withdrawal of the free version of Bitkeeper), they were begun at virtually the same time, and they share many of the same goals.

They were also both originally designed to run on Linux, but it seems that Hg adapted to Windows faster and Git has been playing catch up in the cross platform arena. I don’t see that as much of a concern today since both systems functional perfectly well in a Windows environment. That being said, I consider the fact that CodePlex uses Hg as a pretty solid endorsement.

For me and my purposes, the best thing about Hg is that it feels less complex and seems more Windows friendly. This is really because the supporting Windows software, which I’ll cover shortly, is more advanced. The overall impression I got was that if I “just want to do source control”, then I can get up and running faster and easier with Mercurial, without the need to learn a ton of command line stuff. Since I have not implemented Git I cannot compare, but I was able to get Hg up and running pretty easily.

Implementing Mercurial

Python

Hg is built on Python, so you will need to at least install the Python Windows Binary before you can install Hg. Python is free and open source, so just download it from the Python homepage. I chose the 2.6.5 Windows Installer (binary only) because I don’t want or need the source, but feel free to dig as deeply as you like. Also, as of this writing there is a newer version of Python, but the download page states “If you don’t know which version to use, start with Python 2.6.5; more existing third party software is compatible with Python 2 than Python 3 right now.”

Installing Hg

Remembering that DVCS means each install is a full repository, there is no Hg Client vs. Hg Server installation. Instead, you simply install Hg. If you plan on just using the Command Line interface, you can simply download and install the latest version.

BUT WAIT!

If you plan on using the Windows Integration features, which I would recommend, then skip this step and proceed to the next section on TortoiseHg.

TortoiseHg

TortoiseHg is a Windows Shell Extension that makes working with Hg in Windows a breeze. Once installed, you can access the source control tools directly from Windows Explorer by right-clicking on folders: the tools will be integrated into the context menus.

The reason we skipped the step above is that installing TortoiseHg will also install the latest version of Mercurial, so for a Windows developer this is where I would start.

VisualHg

If you do not use Visual Studio, you now have all you need to easily and quickly get started with Hg. If you do use Visual Studio, there is one other tool you will want to install: VisualHg.

VisualHg integrates most of the TortoiseHg features into Visual Studio, so you can manage your repository from directly within the IDE. Additionally, it adds icons to your Solution Explorer letting you know when files and projects in your Solution need to be committed to your local repository. It’s built on and tightly integrated with TortoiseHg, so that is a prerequisite.

Bitbucket

Hg’s answer to Github is Bitbucket, which doesn’t have the reputation that Github has but seems to have the same basic toolset and abilities at the same price. For several reasons, I was very keen to host my source elsewhere, so I went ahead and created a free account to experiment. Using the service has been really easy, and linking my local repository to the private repository I created on Bitbucket is very simple: since it just uses HTTP, all I have to do is provide Hg with the link to the repository on Bitbucket.

Now how the heck do I use this thing?

As a stone cold newbie, I needed some guidance. A site I found very helpful, both to deciding to use Hg and in learning how it works, is Joel Spolsky’s excellent HgInit.com. This is probably the best non-video training I’ve seen on DVCS. It is command line oriented, but I suggest you go through it (probably more than once) to help understand what the GUI tools are doing for you. I know I will be returning to this site again in the future.

Also, I don’t often plug services you have to pay for, but TekPub.com is worth every penny. The videos are fantastic and widely varied. In this case, they have a series of videos called “Mastering Mercurial” by Rob Conery, a very well known figure in .NET land. This series uses TortoiseHg and Visual Hg and is a superb walk-through of Mercurial in a real world environment. If you have TekPub, go watch this series. If you don’t have TekPub, buy it, then go watch this series!

After that, the best thing I can recommend is to simply try it out. A buddy of mine and I have used Bitbucket and played around with making simultaneous changes to files, merging, multiple heads, etc. I think like a lot of things it will just take practice. In my case, as a lone developer, it is very simple: I make changes, I commit those changes to my local repository, and I update (or Push) t
hose changes to the central repository on Bitbucket.

Some Closing Thoughts

I have a tendency to suffer from “paralysis by analysis”, so this process took me far longer than it probably should have. Once I finally decided to do something about it, though, actually getting up and running was a pretty short exercise. I’d say it took me roughly half a day to get everything installed, figure out how to use Bitbucket, watch some videos, and learn how to use TortoiseHg and Visual Hg.

I want to make it clear that I am not advocating any particular solution. It does seem obvious to me that DVCS is the way of the future, which at this point means choosing between Git or Mercurial. Right after I selected Mercurial and got it up and running, I came across this article that has me wary of my choice. I’m not going to switch or anything like that, but I will proceed with a watchful eye. And I will continue to study Git and DVCS in general.

I have plenty left to learn: branching, multiple heads, sub-repositories, merging, and more. For now, I am just happy to be using source control: progress has been made!

Categories: Project Greenfield

May 10, 2010 joelcochran 2 comments

I am in a theoretically enviable position: I am beginning a “green field” project. A green field project is one that begins with a completely blank slate: no preconceptions about what technologies to use, what methodologies to employ, or what the final product will look like. This is the project we all dream about: total freedom and total control. I am no longer hobbled by an existing database. I am no longer restricted to “how we’ve always done things.” Paraphrasing Sarah Conner from the original Terminator, for the first time the future is unclear to me.

At first glance, this sounds like a developer’s dream come true, and in the end it probably is, but as I near the beginning of the project I begin to see it as an embodiment of the saying “be careful what you wish for, you just may get it.” This is why I say my position is theoretically enviable. While I have complete freedom, I also have complete responsibility. And to top it all off this project is make or break for the company. If this project fails, we might as well close the doors. And no, I am not overdramatizing.

My plan, for what it’s worth, is to document this undertaking.

Where to begin…

I had a long section written here about the history of my company, our software, our customers, and why we were tackling this project. Then I realized that, in fact, this is what I am trying NOT to do: focus on the past. I don’t want to rehash where we’ve been because I don’t want it to taint where we are going. And so far that is the hardest thing: I met with our domain expert to discuss some of the target goals, and I had to steer the conversation away from the existing product several times.

While this is really, truly, everything new from the beginning, there are some decisions that have already been made, so let’s get them out of the way.

We will use SQL Server. I’ve long believed that data is king. I always start with the data: the database, schema, relationships, etc. Ultimately it is the reason we are in this business. Almost every RFP we have received in the last 5-7 years has required SQL Server: it is becoming the de facto standard in our market. Since we have never offered a SQL Server solution we are frequently unable to bid for contracts. This fact is the driving force behind this project. It’s bad enough when the other kids make fun of you, but far worse when you’re not even allowed on the playground.
We will use .NET. If we are going to make the jump from IBM to Microsoft, from Green Screen to GUI, from DB2 to SQL Server, then we’re going whole hog. Knowing that SQL Server is our target database, what better decision could you possibly make than to develop the rest of the application on the Microsoft Stack?
We will use Version Control. Our current software was originally written in the mid 80’s. I realize that’s longer than some of you readers have been alive, so it may be a shock to you, but yes software that old does work. The software has been continuously modified, upgraded, and maintained over that period, but it has never been in source control. Our first action will be to implement version control, which I will cover in my next post.
We will use Unit Testing. It probably goes without saying, but our existing software has exactly 0 unit tests. The nature of the platform and the development environment do not lend themselves to unit testing, TDD, mocking, etc. Don’t get me wrong, the software is thoroughly tested, but not in any kind of a “best practices” sense of the word. While the verdict is not yet in on TDD, I’m definitely feeling pulled in that direction. Again, I’ll be posting about that when the time comes.
We will use Agile Techniques. At least, we’ll use some parts of Agile. Company owners, users, and domain experts aside, this is essentially a one man operation, so that naturally means no pair programming. I’m also not sure what a one man stand up would look like. That being said, I’ve consulted some practitioners and there are things I can do. I have a couple of books to read and I bought a bunch of post-it notes, so we’ll see.

With the exception of .NET, everything in the list above is a new endeavor for me and my company. And none of the above mentions the technical specifics: there are a lot of decisions to made there, many of which will be new for us as well. This is a huge undertaking, so I expect to encounter some failure along the way. I’m OK with that: we all know you learn more from your mistakes than your successes.

Where we go from here

I’ll be spending the next couple of weeks in project preparation: setting up version control, writing specifications, developing guidance, establishing processes, etc. Along the way I’ll be posting about what I’m going through, what’s going through my head, and what decisions I’ve made.

Given the scope of the project, I expect to be writing about it for quite a while. Along the way, if you are interested, I encourage you to participate in the comments. I will place every post in this on going series in the Project Greenfield category. It should be fun!

Categories: Project Greenfield

Developing For .NET

Archive

Project Greenfield: Learning SSIS

Project Greenfield: Testing the TDD Waters

Project Greenfield: Implementing Source Version Control

Project Greenfield

Recent Posts

Archives

Categories

Meta