Stored Procedure vs Parameterized Query
I want to start by saying I am not a database expert or an DBA by any stretch of the imagination. I am, though, an old AS/400 RPG developer. I believe that by necessity AS/400 developers tend to be more familiar with the native database functions and maintenance than their .NET counterparts. Even so, there are a great many capabilities that go unused by the day to day coder.
On the other hand, .NET developers tend to be more familiar with things like SQL and Stored Procedures. While these tools are second hand to most developers, they are far less common in RPG applications because RPG has direct database access. These are merely observations: each access technique has its strengths and weaknesses, a discussion of which is not the purpose of this article: I’ve probably rambled on about it too long already.
My day to day experience is typically making these two worlds peacefully coexist. The task this week is another conversion project, but this time from AS/400 tables to AS/400 tables. An argument could be made that I should have done the conversion in RPG using native database access, but in this particular case, I have some string manipulation that is far easier to handle in .NET than RPG.
The project is simple enough: in our legacy application we are making several significant changes: normalizing data, expanding configuration options, and extracting embedded data into a supporting table. The normalizing part requires that we combine data from two tables into a single repository. At the same time, those values contain embedded information that needs to be extracted and stored in a separate table.
The configuration of that embedded data has been expanded. It used to be fixed but is now much more flexible, allowing the client to configure the description, rules, and presentation of the data. As a result, each embedded item also produces a series of table entries in a related table: the number of entries and rules are not known at design time.
This may not be totally clear, but the result is the important part here. In a small client data set, what used to be 3,458 records in one table and 3,013 in the 2nd table is now 6,471 records in a single table. Those 6,471 also create 38,862 records in the extracted data table. There is also a 3rd table with 6,471 rows that acts as a cross reference between these this table and the legacy system that adds some additional information. So all in all, we’ve gone from 6,471 records in two tables, poorly normalized, into three tables with a total of 51,804 records.
The Problem of Performance
Obviously these are not big tables, but performing the table inserts over the wire from a .NET application incurs a big performance hit. I’m not going to point fingers, because I honestly don’t know where to point: it could be a problem with the SQL engine on the iSeries, or with the ADO.NET Managed Provider (our version is V5R4, which is still based on .NET 1.1). It’s unlikely that it is the network or the my PC, but I won’t rule anything out for certain.
I wrote previously about using Stored Procedures on the iSeries to wring out the last bit of performance. Overall, we’ve been very satisfied, and performance has been excellent. I wrote a few Stored Procedures to handle the inserts to these three tables and used them from my .NET application to perform the inserts. Quite unexpectedly, when I tested this small data set the performance was terrible. It took 57 minutes to perform this task, and this was our smallest data set. Our largest is more than 10 times the size, so it could potentially run 10-12 hours.
At first, I was tempted to leave it alone: this is not production code, but rather a utility that will be run once for each of our clients and then discarded. At least, that’s what I thought. I later realized that each client may have multiple data sets, so this may be run as many as 10 times per client. Then on top of that, I realized that my staff has to run this for all the clients. At that point, performance for this “one time” application suddenly became a huge problem.
Things aren’t always what they seem…
I was floored at these initial results. The reason we started using Stored Procedures is because they are supposed to provide the highest degree of performance. I refactored the logic twice trying to speed it up but to no avail. I assumed I had done something wrong in the procedure, so I hit the books, but I didn’t find anything. There’s not a lot to a procedure that just maps incoming parameters to an Insert statement.
There is one thing in the procedure, though, that I believe was the culprit. The way our legacy database is structured, you never know until run time what library (database) and file (table) you need to execute the function against. This means that the variables indicating those have to be passed in to the Stored Procedure … every … time.
This has not been a problem when pulling data, but I believe because of the frequency and interval with this mass of Insert statements that it was really hampering performance. I believe that if the Stored Procedure had specified the library (database) and file (table) names then the performance would not have been an issue. The system had to determine what access path to use to the database on every single request. The purpose of using Stored Procedures in the first place had been completely negated. I might as well have been using dynamic strings in my code.
Fortunately, I came up with a solution I can live with: I turned to an old friend, the Parameterized Query. As I got to thinking about the problem, I realized that what I needed was to create the command object once and reuse it. I used Parameterized Queries frequently before turning to Stored Procedures, and they suit this particular problem because I only need to set the library (database) and file (table) names once.
So I fired up a branch in Mercurial and hacked up my code. I refactored out some of my refactorings and replaced all the Stored Procedure calls with Parameterized Queries. I made sure I was only creating the command once and then reusing it for each insert. I anxiously ran the program again: 9 minutes later I had the confirmation I needed. This was MUCH better.
So the lesson for today is to never trust your assumptions. I was sure that Stored Procedures would automatically give me the best performance. I wasted time developing the procedures, I wasted time implementing them, then I wasted time testing and debunking my incorrect assumption. Now I’m wasting time writing about it :-) But it reminds me of a very valuable tip: choose the right tool for the right job.