Friday, May 13, 2011

Practical MSBuild - Database Migrations

I'd like to propose an addenddum Greenspuns Tenth Rule:
Any sufficiently complicated c# program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Fluent Migrator.
Keeping database structures up to date with other developers, not to mention production and testing envireonments is an Incredibly common problem and Fluent Migrator  is a seriously good library that helps solve this problem elegantly. Amongst other eco systems the idea of migrations is well accepted, ruby has rails migrations, java has migrate4j and python has yoyo migrations, but for some reason the idea has been slow to gain traction in the c# world.

One of the barriers to adoption so far has been documentation, so in this article I want to show how to leverage Fluent Migrator. There are plenty of good articles on writing migrations so I will only touch on that briefly. Instead, I want to show how to use it as an integral part of the build process, how to avoid some common pitfalls and how to integrate it within the greater project lifecycle.


Whats Wrong With *

For start I'm going to assume that sharing a database between developers is bad idea. Someone may need to make large scale changes that disrupt other developers. Having long runnning feature branches is impossible (without another database) and it does nothing to solve the problems your eventually going to face at deployment time. This is obviously not an optimal solution.

Of the many half solutions I've seen, simple sql scripts are the most common. Advanced versions of this have sequentially numbered scripts and some sort of batch process to run them in order, in other word, half of Fluent Migrator. Often they will contain checks to ensure there not executed multiple times (so the same column doesn't get added twice for instance),  another 1/4 of Fluent Migrator.

From a simplicity point of view this may seem rather enticing but doesn't work nearly as well in practice. This approach isn't very branch/merge friendly and migrations will need to be run in an explicitly defined order. Sql is also particularly verbose language, especially for structural changes, how many of you can remember the syntax to create a table with an index of the top of your head?

Other solutions I've come across, such as Visual Studio database projects, suffer from similar shortcomings and are deeply rooted in the old school "quarterly release" type project, where a once off upgrade script is written for every release. One particularly notable "solution" I came across was using data dude to generate patch scripts. This was so slow, cumbersome and error prone I still shudder just thinking about it.

With that in mind here are the goals for this solution:
  • Local - Upgrades should be able to run anywhere from the developers PC to the server.
  • Fast - This is a freeby (with Fluent Migrator) most migrations will take seconds from start to finish.
  • Frequent - Once fast and easy are solved you'll want to run them much more frequently.
  • Production Ready - Migrations should be able to be deployed at a moments notice. This is a by product of the above three.

Step 1 - Backup, Delete, Restore, Obfuscate

The first few requirements are basically "filling in the gaps", handling the tasks that Fluent Migrator doesn't deal with. Never the less they are an important part of the overall process. In these examples I will be using Sqlite because it's simple, file based nature makes it easy to follow the examples without getting lost in the details. Here are the important bits of the database.build file (I'll post the whole file at the end), this is created with the same techniques I outlined here:

<Target Name="dbCreate" >
    <!-- This won't actually create the file because it is an empty database -->
    <Exec Command="$(dbSqlitePath)\Sqlite3.exe $(dbPath)\$(dbName) " />
</Target>

<Target Name="dbBackup" >
    <Copy SourceFiles="$(dbPath)\$(dbName)" DestinationFiles="$(dbBackupPath)\$(dbBackupName)" />
</Target>

<Target Name="dbRestore" >
    <Copy SourceFiles="$(dbRestoreSource)" DestinationFiles="$(dbPath)\$(dbName)" />
</Target>

<Target Name="dbDelete" >
    <Delete Files="$(dbPath)\$(dbName)" />
</Target>
    
<Target Name="dbObfuscate"/>

Backup and delete should be fairly self explanatory but restore needs a bit more explanation. I've found it more useful to start from a "known good" version of the database rather than starting from scratch every time. For this I strongly favor using backups from production databases. This gives us much more realistic data to work with when we're developing, quantatively and qualitatively. Failing that, a database with somewhat realistic looking test data would do.

A pleseant side effect of using real production data is that, by the time a release comes around, we have run our data migrations dozens/hundreds/thousands of times. This will give much more confidence with the process and make releases much less stressful. An emergency patch can also be tested and verified using much the same process as a full release (because it's so quick and easy), so no more cowboy live patches.

Obviously if we have real data it may need obfuscating due to any number of privacy issues that could pop up. Security on developer workstations probably isn't as thorough as it is on the server and we don't want people taking sensitive data out of the office. However, I have purposely left the task empty, simply because it will vary so much from project to project.


Step 2 - Writing a Migration

Before any migrations can be written they need a project, a Fluent Migrator script is simply c# library (dll) after all. Usually it gets names something like YourProjectName.Migrations. Here is my recommended database structure:



As you can see each revision will get it's own folder, sooner or later you will have enough migrations to warrant this. The other thing of note is that each version has a data and structure folder. The strucure folder will contain the bread and butter of Fluent Migrator, adding/removing tables and columns. The data folder exists to make it easier to find migrations much create/update/delete any reference data that our applications will inviteably have.

Finally some actual code. This example is a simple migration  that creates a products table with a name, description and primary key:

[Migration(0101201104301422)]
public class _01_01_2011_04_30_14_22_AddProductTable : Migration {

    public override void Up() {
        Create.Table("products")
            .WithColumn("product").AsInt32().PrimaryKey().Identity()
            .WithColumn("name").AsString(128).NotNullable()
            .WithColumn("description").AsString(int.MaxValue).NotNullable()
        ;
    }

    public override void Down() {
        Delete.Table("products");
    }

}

The interesting thing here is the version number which is is probably the least intuitive part of this article. The convention used is: {major}{minor}{year}{month}{day}{hour}{minute}. Major/Minor are our product versions, this ensures that migrations are executed in the order the product is developed. This is particularly important if a project has long running branches with features and bug fixes. The rest is the time the migration is created, hours and minutes simply ensure (almost always) that there are no colisions with other developers writing migrations concurrently.

This numbering scheme will save nuch frustration at merge time.


Step 3 - Running Migrations

At the time of writing the msbuild task isn't working with the version of Fluent Migrator I'm using, it also isn't documented so I'll be using using the exec task coupled with the Fluent Migrator console runner. One other thing to remember is that Fluent Migrator is currently a c# 3.5 project so all migrations projects need to be as well. Here is the MS Build task to execute the migrations:

<Target Name="dbMigrate" DependsOnTargets="compile" >
    <Exec Command="$(dbMigratorPath)\Migrate.exe --target=src\app\PMSBuild.Migrations\bin\debug\PMSBuild.Migrations.dll --db=sqlite --c=&quot;Data Source=$(dbPath)\$(dbName);Version=3;&quot; --verbose=true --version=$(dbVersion)" />
</Target>

There are quite a few properties here but it's the dbVersion property which is the most important. The default is set (at the top of the script) to 0, which will run all migrations. Because the precedence rules I outlined in my last article are used (Practical MS Build - Flexible Configuration), it is easy to migrate to a specific version if needed. From the command line simply specify a new value for the property:

>msbuild project.build /target:dbMigrate /property:dbVersion=0101201104301422


Step 4 - Migrating Data

One of the most cited reasons for dismissing Fluent Migrator is that it doesn't handle data migrations. This is throwing the baby out with the bath water. It is true that Fluent Migrator doesn't handle this but it does provide an excellent framework to execute and track such migrations. Because we're pragmatic programmers that use the right tool for the job, we'll want to modify our data with a language made for just that: SQL. Fortunately Fluent Migrator allows us to execute arbitrary sql, we just need a little bit of organisation and self discipline.

Lets say a migration needs to add a meta column to the products table created above and that we need to populate it with the description, to serve as a place holder until a real person edits it. The first part is easy, just create another migration which adds the column:

[Migration(0101201105021944)]
public class _01_01_2011_05_02_19_44_AddMetaColumnToProduct : Migration {

    public override void Up() {
        Create.Column("meta").OnTable("products")
            .AsString(int.MaxValue)
            .NotNullable()
            .WithDefaultValue("");
    }

    public override void Down() {
        Delete.Column("meta").FromTable("products");
    }

}

Next we need to create an sql file to hold our script. Generally I stick to the same naming conventions used for the migration classes so I created 01_01_2011_05_02_19_44_PopulateMetaOnMigrations.sql in the same directory as the migration classes:

UPDATE products
SET meta = [description]

This is about as straight forward as sql scripts get. Next I create a resource (0101_SqlMigrations.resx) file in the version directory and add the above script as a file resource. This compiles the script into the dll which simplifies things when we need to use our migraions externally (installers etc).  The last thing to do is modify the migration class above with a line to execute the file:

Execute.Sql(_0101_SqlMigrations._01_01_2011_05_02_19_44_PopulateMetaOnMigrations);


Wrapping Up

A fast, flexible and complete solution to manage databases. With this approach you will never again will you fear long running branches. Never gain have to develop against a database being modified by others. Most importantly, never again dread database upgrades at release time. Once you get used to a solution like this going back to anything else will seem slow and archaic and error prone.

As promised here is the complete database.build script:

<?xml version="1.0" encoding="utf-8"?>
<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

    <PropertyGroup>
        <dbName Condition="'$(dbName)' == ''">Test.db</dbName>
        <dbPath Condition="'$(dbPath)' == ''">database</dbPath>
        <dbSqlitePath Condition="'$(dbSqlitePath)' == ''">c:/Program Files/Sqlite</dbSqlitePath>
        <dbBackupPath Condition="'$(dbBackupPath)' == ''">d:\databases\backups</dbBackupPath>
        <dbBackupName Condition="'$(dbBackupName)' == ''">Test_Backup.db</dbBackupName>
        <dbRestoreSource Condition="'$(dbRestoreSource)' == ''">d:\databases\backups\Prod_2011_04_30.db</dbRestoreSource>
        <dbMigratorPath Condition="'$(dbMigratorPath)' == ''" >packages\FluentMigrator.0.9.0.0\tools</dbMigratorPath>
        <dbVersion Condition="'$(dbVersion)' == ''" >0</dbVersion>
    </PropertyGroup>

    <Target Name="dbEcho" >
        <Message Text="dbName: $(dbName)" />
        <Message Text="dbPath: $(dbPath)" />
        <Message Text="dbSqlitePath: $(dbSqlitePath)" />
        <Message Text="dbBackupPath: $(dbBackupPath)" />
        <Message Text="dbBackupName: $(dbBackupName)" />
        <Message Text="dbRestoreSource: $(dbRestoreSource)" />
        <Message Text="dbMigratorPath: $(dbMigratorPath)" />
    </Target>
    
    
    <!--<UsingTask TaskName="FluentMigrator.MSBuild.Migrate" AssemblyFile="lib\FluentMigrator\FluentMigrator.MSBuild.dll"/>-->

    <Target Name="dbCreate" >
        <!-- This won't actually create the file because it is an empty database -->
        <Exec Command="$(dbSqlitePath)\Sqlite3.exe $(dbPath)\$(dbName) " />
    </Target>

    <Target Name="dbBackup" >
        <Copy SourceFiles="$(dbPath)\$(dbName)" DestinationFiles="$(dbBackupPath)\$(dbBackupName)" />
    </Target>

    <Target Name="dbRestore" >
        <Copy SourceFiles="$(dbRestoreSource)" DestinationFiles="$(dbPath)\$(dbName)" />
    </Target>

    <Target Name="dbDelete" >
        <Delete Files="$(dbPath)\$(dbName)" />
    </Target>
    
    <Target Name="dbObfuscate"/>

    <Target Name="dbMigrate" DependsOnTargets="compile" >
        <Exec Command="$(dbMigratorPath)\Migrate.exe --target=src\app\PMSBuild.Migrations\bin\debug\PMSBuild.Migrations.dll --db=sqlite --c=&quot;Data Source=$(dbPath)\$(dbName);Version=3;&quot; --verbose=true --version=$(dbVersion)" />
    </Target>
    
</Project>

3 comments:

  1. Wow. this is a great post from start to finish. The FluentMigrator committers thank you for an incredibly in depth article. We will be posting a link to this on the github page. Thanks!

    ReplyDelete
  2. Awesome post. Thanks... Stupid question though, what is the .resx file for?

    ReplyDelete
  3. Not stupid at all, a lot of developers don't work with products that need/want resource files. Generally I don't find them desirable, this is an exception.

    A resource file (.resx) is just a way of embedding files into a dll/exe. Resource files usually contain things like images and language files to make deployment easier.

    Usually we want to create installers/packages which means packaging Fluent Migrator and our migrations with it. Having our sql scripts embedded in the dll will mean a lot less file management.

    ReplyDelete