If you’re a regular reader then you’ll know I’ve recently become enamored of SSIS. I’m absolutely blown away by this product, I wish I learned about it 5 years ago!
Of course, being new to such an extensive technology means dealing with landmines along the way. Unfortunately, I’ve run into a couple of them already. The first one was my own fault: I developed a project in 2008 but the client has Sql Server 2005. To solve it I had to install Sql Server 2005 and create a 2005 version of the Package.
AcquireConnection Method Error
The second issue I’ve encountered was hardly as intuitive. I was at the client site yesterday deploying my first SSIS Package outside my own network. With the help of Andy Leonard, my SSIS mentor, I had worked through all the necessary steps to make the Package dynamic, a process worthy of a blog post of its own. At the client site I followed my cheat sheet to the letter, explaining every step to the client along the way. After solving a missing prerequisite problem on the server, we fired up the Package only to be quickly stopped with a Validation failure message:
The AcquireConnection method call to the connection manager failed with error code 0xC0202009.
I assumed there was a problem with the dynamic connection strings. I spent about an hour diagnosing the problem only to find out there was no problem, at least not with the dynamic connections. The problem, it turns out, was in a thing called DelayValidation.
Dynamic vs. Design Time Connection Strings
When you design an SSIS Package, you need to define connections to data sources and destinations that you can access at design time. Without these connections, SSIS would be pretty useless. So these design time data connections have to be hard coded somewhere: I put mine in package variables and bind the ConnectionString property of the ConnectionManager objects to the variables. These variables are then overridden at runtime by external values, making the connections dynamic and portable.
What I was unaware of was that each ConnectionManager object and each DataFlow Task has a property called DelayValidation. When the Package begins, it validates all of the DataFlow tasks prior to executing any of them. If DelayValidation is set to False, then the validation process uses the design time connection objects. In other words, it tries to connect using that connection string before it has a chance to be set by the dynamic process. And here’s the kicker: False is the default value, so this is the default behavior.
Working on my own network, I never even noticed this was happening. On the client network, however, it immediately blows up because the tasks cannot connect to my design time databases. I found a good article online that explained the problem and it actually makes sense. The solution is to set DelayValidation to True, which will delay the validation process until the task begins to execute, giving it time to be set dynamically.
I started by setting the property on every ConnectionManager object. At first I believed that this would be sufficient but it didn’t stop the validation error: I had to do the same thing to every DataFlow task as well, after which the Package ran properly. This was a real pain because I had so many to change. I could select multiple tasks within the same container, but I could not select tasks across container boundaries, and I have a lot of containers.
While I understand what DelayValidation is doing and why it does it, I don’t understand why False is the default value. If nothing else, it seems to me this should be a Package level setting. If you know of a better or easier way to handle this problem, please share it in the comments below.
While Expression Blend remains my favorite tool and Visual Studio my most productive tool (with a little R# love), Sql Server Integration Services (SSIS) has moved solidly into the #3 slot.
I frequently have tasks that require me to move data from one location to another, perform conversions and transformations, create new tables with the resulting columns, etc. In the past, this meant a lot of ADO.NET and a lot of coding. These processes can be exceedingly slow and time consuming to produce and the performance is frequently less than desirable. With what I’ve learned about SSIS, most of that work will be a thing of the past. I can now do conversion work in a fraction of the time it took before and the resulting product, a Package in SSIS lingo, can execute in far less time.
It seems that SSIS is thought of as a DBA’s tool, but I believe that as Blend is not just for Designers, SSIS is not just for DBAs. This post is not going to be a how to or any kind of definitive work: what I want to accomplish is to introduce my fellow developers to the glory of SSIS and highlight some of the reasons I think you should be learning this technology.
Project Greenfield and SSIS
For Project Greenfield, one of the primary tasks is to convert data from the legacy IBM *insert nom du jour here* midrange server database to the new Sql Server database.
This is far more than pushing data from once place to another: the structure is completely different. Relationships are defined now that previously were unenforced and large tables are broken into dozens of smaller, more normalized tables, often in different schemas. Fields that were previously fixed length and Numeric types are now varchars and ints. In some cases single fields have been broken into multiple fields, and in some cases multiple fields have been combined. In all cases, data coming out is Unicode but is being stored as ANSI.
Obviously, this conversion represents a significant body of work in its own right. One of my recent tasks was to provide enough of a conversion that I could start prototyping (fake data just wasn’t what we wanted.) The amount of work I was able to do in a week would have easily taken over a month to write using ADO.NET. And best of all, now that I have a solid framework in place making changes is very easy.
Getting Started with SSIS
In order to start with SSIS, you will have to have it installed. More accurately, you will need the SQL Server Business Intelligence Development Studio installed, also known as BIDS. This is found as an option when installing SQL Server and I’m pretty sure it is not available below SQL Server Standard.
The current version of BIDS runs in Visual Studio 2008. If you already have VS2008 installed you will find a new Project Type category called Business Intelligence Projects added to your existing install. If you do not have VS2008 BIDS will install a Visual Studio 2008 Shell, even if you have VS2010 installed.
To start a new project, select the Business Intelligence Projects category and Integration Services Project in the create new project dialog. Once it is created, opening it and working with it is basically the same as any other solution.
Work Flow in SSIS
BIDS itself is the first application I’ve seen that serves as a compelling example of a Workflow driven application. The Package Designer workspace is organized in tabs, the only two of which I’ve needed so far are Control Flow and Data Flow.
All tasks are defined as compartmentalized units of work. The visual blocks for those are all shown in the Control Flow tab. These tasks may or may not be grouped into containers such as Sequence Container or Foreach Loop Container. You may define as many containers as necessary to organize the Package. So far I have preferred Sequence Containers as they allow me to organize tasks procedurally. Except for the simplest Package, I would not define tasks outside of containers.
There are many different task types available, but I have only needed three so far: Data Flow Task, Execute SQL Task, and Script Task. And now that I have better knowledge of what I am doing, I could get by without the Execute SQL Task.
Data Flow Task
At the heart of SSIS is the Data Flow Task. The basic formula is this: read data from a data source, manipulate/transform that data, then write the transformed data to the target destination. Data sources can be ADO.NET or OLE DB database connections but can also be Excel, Flat, or XML Files. There are even more options for Target Destinations.
In between the source and the target are the Data Flow Transformations which really represent the power of SSIS. Here is a brief list of the transformations I have so far found most useful.
Conditional Split – Evaluates the data in the current columns and creates logical subsets which can then be handled differently. Each subset effectively becomes it’s own data source at that point.
Derived Column – In my mind, the most important transformation of the bunch: derived columns are the new (or replacement) columns built by converting or transforming the source data. SSIS includes a highly evolved “Expression Language” that is used to convert the data at runtime. String manipulation, type conversion, mathematical operations, and much more are all supported.
Lookup – Second in importance only to Derived Column, this transformation allows you to perform lookup actions against other tables. This is essential for preventing invalid foreign key insertion. It also is great for performing Incremental Loads: basically, this means only inserting records into the database if they don’t already exist. This becomes important for several reasons, the least of which not being that I want to be able to execute the Package as often as possible, especially during development.
Multicast – Multicast creates multiple copies of the current data set, so you can perform multiple writes to multiple destinations.
The Script Task
The Script task allows you to write code to do things. Primarily I have used this to work with package variables, a whole topic in its own right, and for making OLE DB connections dynamic. I see substantial potential in the Script Task though as it really opens up the entire .NET Framework to the process.
Obviously, this barely scratches the surface of SSIS. BIDS is primarily a graphic tool, but there are distinct functions that developers could really leverage. The other place where developers could shine in SSIS is in process flow: once I understood things like Sequence Container and Conditional Split I really felt like I could make SSIS sing. This kind of flow is exactly what we code day in and day out, so I think developers can pick up SSIS quickly.
I may write some more about SSIS going forward, but if you are interested and looking for info now I recommend you check out Andy Leonard’s blog.