Skip to main content
Thoughts from David Cornelius

Category

In the last entry, I gave an overview of the large software update I delivered to a client, an update that should've been done incrementally over a period of several months. From database schema changes, to swapping out a reporting engine, to switching from ANSI to Unicode, I really bit off more than I could chew at once. But it's now working well and I'm once again sleeping at night!

This portion of the story deals with the database changes that were made, both the structure and the character set.

Built with ElevateDB First, remember I am using the ElevateDB embedded database and components from Elevate Software. This database is as quick and easy to use as the old BDE databases like Paradox and dBASE, but with the power of a client-server database engine and SQL-2003 support. It has both ANSI and Unicode versions and supports all recent versions of Delphi, C++Builder, and Visual Studio. So there's lots of flexibility to move from one environment to another.

There were many parts of this upgrade to be tested and I wanted to automate the upgrade process as much as possible so I could run it over and over and make sure that once I was ready, I could complete the entire conversion process from their live data with a minimum of down time.

STEP 1 - Introduce GUIDs as replacement primary keys

The first decision I made was to introduce GUIDs in almost all of the tables. These fields were going to replace the auto-incrementing integers being used as primary keys, so I couldn't just modify the table structure and change the field type. Additionally, there were several master-detail table relationships where child records were linked to parent records with the integer fields.

So, the GUID fields were the first ALTER TABLE statements in the big conversion script. Since the GUID fields were going to be primary keys and by convention, I always place the primary key fields as the first field in the table, the SQL statements were of the form:

ALTER TABLE "MyTable" ADD COLUMN "GUID" GUID COLLATE "ANSI" DEFAULT CURRENT_GUID() NOT NULL, ALTER COLUMN "GUID" MOVE TO 1;

The child tables also had GUID columns, but the field names reflected the parent table to which they linked.

After the GUID fields were created, I had to link all the child records to their parent records based on the current primary key. Here's the section of the script that updated the Purchase Order (PO) details with their corresponding header (parent) records:


DECLARE DetailsStmt STATEMENT;
DECLARE HeaderCursor CURSOR FOR HeaderStmt;
DECLARE HeaderID INTEGER;
DECLARE HeaderGUID GUID;
 
PREPARE HeaderStmt FROM 'SELECT GUID, ID FROM POHeaders';
PREPARE DetailsStmt FROM 'UPDATE PODetails SET PO_GUID = ? WHERE ID = ?';
 
SET PROGRESS TO 0;
OPEN HeaderCursor;
SET NumRows = 0;
SET TotalRows = RowCount(HeaderCursor);
FETCH FIRST FROM HeaderCursor(GUID, ID) INTO HeaderGUID, HeaderID;
WHILE NOT EOF(HeaderCursor) DO
  EXECUTE DetailsStmt USING HeaderGUID, HeaderID;
  FETCH NEXT FROM HeaderCursor(GUID, ID) INTO HeaderGUID, HeaderID;
  SET NumRows = NumRows + 1;
  SET PROGRESS TO TRUNC(((NumRows / TotalRows) * 100));
END WHILE;
CLOSE HeaderCursor;
UNPREPARE HeaderStmt;
UNPREPARE DetailsStmt;

Similar UPDATE statements in several more while-loops had to be implemented for linking sales people, contacts, notes, and other records together.

After that, the primary key constraints had to be dropped, the ID fields themselves had to be dropped, then the new primary key constraints utilizing the new GUID fields had to be established. To clean things up, I also deleted any orphaned records found.

Most of the functions, triggers, stored procedures, and views had to be changed as well. Since there was no data involved in these database objects, I could simply call ALTER FUNCTION, ALTER TRIGGER, etc. statements. Even then it would've been a lot of typing had it not been for EDB Manager's reverse engineer capability. I was able to generate a large section of the script and just go in and modify the text. Then, of course, I had to run it repeatedly on a copy of the database until all the errors were cleaned up.

In amongst the ALTER TABLE statements, I added a few table changes, like new or changed fields, the deletion of some obsolete fields, some new indexes, corrected collation specifications, and even one or two additional constraints for good measure.

When all was said and done, the script was over 1,100 lines long and took about 3 minutes on my local server (it's a small database, just over 700 MB in 22 tables). My first goal was complete!

Step 2 - Convert the database from ANSI to Unicode

Since I was upgrading from a Delphi 2006 ANSI-based database to a Delphi 2009 Unicode-based database (not necessary, but it's what I chose to do), I had to export all the data to intermediate CSV files, then import those files into the new database. (A smoother transition is in the works.)

Since the import process required an existing database structure, I had to first create that. EDB Manager was able to generate a "create" script of the entire existing database changing the collation on-the-fly. But some of the function and stored procedure declarations were out of order. After rearranging them and making a few more tweaks, my "create" script was done.

The export was easy, but I like to keep things generic and flexible, so after doing it for a couple of tables in EDB Manager, I simply pulled the SQL history out, put the statements in a CURSOR loop and another script was born:


SCRIPT
BEGIN
  DECLARE TableCursor CURSOR FOR TableStmt;
  DECLARE TableName VARCHAR;
  PREPARE TableStmt FROM 'SELECT Name FROM Information.Tables';
  OPEN TableCursor;
  FETCH FIRST FROM TableCursor (Name) INTO TableName;
  WHILE NOT EOF(TableCursor) DO
    SET STATUS MESSAGE TO 'Exporting ' + TableName;
    EXECUTE IMMEDIATE 'EXPORT TABLE ' + TableName + ' TO "' + TableName + '.csv" IN STORE "AnsiExport"';
    FETCH NEXT FROM TableCursor (Name) INTO TableName;
  END WHILE;
  CLOSE TableCursor;
  UNPREPARE TableStmt;
END

Now all I had to do was import the files into the new database. But there were two little problems. First, the .CSV files were in ANSI text format and needed to be converted to actual Unicode text files. Second, ElevateDB currently only supports Unicode files without a Byte-Order Mark, or BOM, which is common in Unicode files.

Now the ubiquitous Windows application, Notepad, can read in an ANSI text file and save to a Unicode file, but it leaves the BOM in. There are several discussions on the ElevateSoft newsgroup about this and many technical references throughout the internet, but none of the utilities I found did what I wanted as conveniently as I wanted. So I wrote a quick program to convert ANSI text files to BOM-less Unicode text files.

With the CSV files finally prepared, I could import them into the new database. I started doing this manually, but quickly realized they had to be done in a certain order because of the constraints already in place. (In hind-sight, I would've waited to create all the constraints and indexes until all the records were imported, but that's all past, now.) So one more script was added to the growing list.

The second part of the database conversion was now complete, albeit in three parts:

  1. EDB script to export all the tables from the ANSI database
  2. Run the ANSI-to-Unicode conversion program to convert the CSV files and strip the BOMs
  3. EDB script to import from the CSV files into the new database.

Step 3 - Convert Delphi ANSI field types to Unicode field types

Finally! The database itself was complete and I could get back to Delphi. But a surprise was waiting for me.

Converting the bulk of the code from Delphi 2006 to Delphi 2009 was actually fairly simple. I don't use a lot of component sets and those I do use actively upgrade their products to work with the latest versions of Delphi, so there was very little hassle there.

I had read some about Unicode support in Delphi 2009 and why and how to work with Unicode, but after fixing a few things, nothing stood out as a major obstacle.

Now let me pause and explain something. When I write database applications, they are typically fairly small programs, only comprising of one or two executables, about 50 units, and no more than one or two central data modules. So I seldom create elaborate classes that mimic the data but instead place table or query components on the data module, open them up in design mode and add persistent field objects which can then be used directly in the code. I also hook them up to data-aware edit controls to eliminate the need to copy them to and from the controls myself. Many people frown on this approach and I have actually tried it the other way too, but it takes so long to build the classes and even longer if anything has to change, that it just isn't worth most of the time in my opinion. (I'm sure that someone with RemObject's DataAbstract will enlighten me...)

But this time, it got me. If I had used FieldByName('MyField').AsString to reference the values in code, I think the project would've happily used the new default definition of AsString. The ANSI version of Delphi and EDB created TStringField objects, but the Unicode version created TWideStringField objects. All string-type fields were affected, including memos and GUIDs. What's more is that they didn't reveal themselves at design-time unless I happened to activate one of the table or query components. Most of the time, I would go into an obscure part of the program and run into another error. It took a while to flush them all out and fixing them meant deleting the original fields and recreating them at design-time so the Unicode version would be created for each string field. The down-side of that was that sometimes I had display formats or other properties or events setup and once the original field was deleted, those settings were gone or unlinked.

Finally, after all these issues were cleared, the new application was running--or rather hobbling. Sometimes I would see Asian characters where before had been a perfectly normal note. Other times, there was no data at all (usually meaning I forgot to hook up one of the recreated data fields). It took a bit to clear all this up.

At this point, I was still using Rave Reports, the free edition that comes with Delphi. I thought I could get away with keeping this report engine, but started seeing a variety of problems. To Nevrona's credit, it probably wasn't their product so much as the monumental changes to the application and my increasing feelings of uneasiness in the stability of the company that swayed my placement of the blame. In any case, I made a critical decision--and started the last major hurdle of the Mega Update.

Tune in next time for details of the Big Report Conversion.