June | 2020 | Disparate Opinions

Testing your SQL queries is as important as any other piece of logic. Unless you only do reads and writes, presumably some type of logic will be implemented at least in the form of a query, and you would like to validate that logic same as any other.

Overview

For this you need database integration tests, There are multiple strategies for this (in-memory databases, additional abstractions and mocks, or creating a temporary but real database, just to name a few) but I will in this post discuss running a linux SQL Server docker image, applying all migrations to it from scratch and the running tests on top of it.

Technology choice is beyond the scope of this text. I use .NET Core 3.1, XUnit and legacy C# because I know it already and because my F# is not idiomatic enough for me not to go on tangents and end up writing a monad tutorial instead. I have used MySQL / MariaDB before and I will never use it for anything I care about. I have tried Postgres, and I like it , it is a proper database system, but again, not familiar enough for my purposes this time. To reiterate, this post is based on using C# on .NET Core 3.1 over MSSQL Server and the tests will be run on push using Github Actions.

My development machine is really trying, OK, so let us cut it some slack. Anyway, I have Windows 10 insider something, with WSL2 and Docker Desktop for WSL2 on it. I run Ubuntu 20.04 in WSL2, dist-upgraded from 18.04. I develop the code in VS2019 Community on Windows, obviously.

Problem

This is simple, when a commit is made to the part of a repository that contains DbUp SQL Scripts, related production code or these tests, I want to trigger tests that verify that my SQL Migrations are valid, and when SQL queries change, I want those changes verified against a real database server.

I do not like docker, especially docker-compose. It seems to me it has been designed by people that don’t know what they are on about. Statistically that cannot be the case, since there are tens of thousands of docker-compose users that do magical things, but I have wasted enough time, so like Seymour Skinner I proclaim, “no, it is the children that are wrong!”, and I thus need to find another way of running an ad hoc SQL Server.

All CI stuff and production hosting of this system is Linux based, but Visual Studio is stuck in Windows, so I need a way to be able to trigger these tests in a cross platform way.

Clues

I found an article by Jeremy D Miller that describes how to use a .NET client of the Docker API to automatically run a MSSQL database server. I made some hacky mods:

internal class SqlServerContainer : IDockerServer
{
    public SqlServerContainer() : base("microsoft/mssql-server-linux:latest", "dev-mssql")
    {
        // My production code uses some custom types that Dapper needs
        // handlers for. Registering them here seems to work
        SqlMapper.AddTypeHandler(typeof(CustomType), CustomTypeHandler.Default);
    }

    public static readonly string ConnectionString = "Data Source=127.0.0.1,1436;User Id=sa;Password=AJ!JA!J1aj!JA!J;Timeout=5";

    // Gotta wait until the database is really available
    // or you'll get oddball test failures;)
    protected override async Task<bool> isReady()
    {
        try
        {
            using (var conn =
                new SqlConnection(ConnectionString))
            {
                await conn.OpenAsync();

                return true;
            }
        }
        catch (Exception)
        {
            return false;
        }
    }

    // Watch the port mapping here to avoid port
    // contention w/ other Sql Server installations
    public override HostConfig ToHostConfig()
    {
        return new HostConfig
        {
            PortBindings = new Dictionary<string, IList<PortBinding>>
            {
                {
                    "1433/tcp",
                    new List<PortBinding>
                    {
                        new PortBinding
                        {
                            HostPort = $"1436",
                            HostIP = "127.0.0.1"
                        }

                    }
                }
            },

        };
    }

    public override Config ToConfig()
    {
        return new Config
        {
            Env = new List<string> { "ACCEPT_EULA=Y", "SA_PASSWORD=AJ!JA!J1aj!JA!J", "MSSQL_PID=Developer" }
        };
    }

    public async static Task RebuildSchema(IDatabaseSchemaEnforcer enforcer, string databaseName)
    {
        using (var conn = new SqlConnection($"{ConnectionString};Initial Catalog=master"))
        {
            await conn.ExecuteAsync($@"
                IF DB_ID('{databaseName}') IS NOT NULL
                BEGIN
                    DROP DATABASE {databaseName}
                END
            ");
        }
        await enforcer.EnsureSchema($"{ConnectionString};Initial Catalog={databaseName}");
    }
}

I then cheated by reading the documentation for DbUp and combined the SQL Server schema creation with the docker image starting code to produce the witch’s brew below.

internal class APISchemaEnforcer : IDatabaseSchemaEnforcer
{
    private readonly IMessageSink _diagnosticMessageSink;

    public APISchemaEnforcer(IMessageSink diagnosticMessageSink)
    {
        _diagnosticMessageSink = diagnosticMessageSink;
    }

    public Task EnsureSchema(string connectionString)
    {
        EnsureDatabase.For.SqlDatabase(connectionString);
        var upgrader =
            DeployChanges.To
                .SqlDatabase(connectionString)
                .WithScriptsEmbeddedInAssembly(Assembly.GetAssembly(typeof(API.DbUp.Program)))
                .JournalTo(new NullJournal())
                .LogTo(new DiagnosticSinkLogger(_diagnosticMessageSink))
                .Build();
        var result = upgrader.PerformUpgrade();
        return Task.CompletedTask;
    }
}

When DbUp runs it will output all scripts run to the console, so we need to make sure this type of information will actually end up being logged, despite it being diagnostic. There are two problems there, we need to use a IMessageSink to write diagnostic logs from DbUp for XUnit to become aware of the information and secondly we must add a configuration file to the integration test project for xunit to choose to print the messages to the console.

Our message sink diagnostic logger is plumbed into DbUp as you can see in the previous example, and here is the implementation:

internal class DiagnosticSinkLogger : IUpgradeLog
{
    private IMessageSink _diagnosticMessageSink;

    public DiagnosticSinkLogger(IMessageSink diagnosticMessageSink)
    {
        _diagnosticMessageSink = diagnosticMessageSink;
    }

    public void WriteError(string format, params object[] args)
    {
        var message = new DiagnosticMessage(format, args);
        _diagnosticMessageSink.OnMessage(message);
    }

    public void WriteInformation(string format, params object[] args)
    {
        var message = new DiagnosticMessage(format, args);
        _diagnosticMessageSink.OnMessage(message);
    }

    public void WriteWarning(string format, params object[] args)
    {
        var message = new DiagnosticMessage(format, args);
        _diagnosticMessageSink.OnMessage(message);
    }
}

Telling XUnit to print diagnostic information is done through a file in the root of the integration test project called xunit.runner.json, and it needs to look like this:

{
  "$schema": "https://xunit.net/schema/current/xunit.runner.schema.json",
  "diagnosticMessages": true
}

If you started out with Jeremy’s example and have followed along , applying my tiny changes you may or may not be up and running by now. I had an additional problem – developing on Windows while running CI on Linux. I solved this with another well judged hack:

public abstract class IntegrationFixture : IAsyncLifetime
{
    private readonly IDockerClient _client;
    private readonly SqlServerContainer _container;

    public IntegrationFixture()
    {
        _client = new DockerClientConfiguration(GetEndpoint()).CreateClient();
        _container = new SqlServerContainer();
    }

    private Uri GetEndpoint()
    {
        return RuntimeInformation.IsOSPlatform(OSPlatform.Windows)
            ? new Uri("tcp://localhost:2375")
            : new Uri("unix:///var/run/docker.sock");
    }

    public async Task DisposeAsync()
    {
        await _container.Stop(_client);
    }

    protected string GetConnectionString() => $"{SqlServerContainer.ConnectionString};Initial Catalog={DatabaseName}";
        
    protected abstract IDatabaseSchemaEnforcer SchemaEnforcer { get; }
    protected abstract string DatabaseName { get; }

    public async Task InitializeAsync()
    {
        await _container.Start(_client);
        await SqlServerContainer.RebuildSchema(SchemaEnforcer, DatabaseName);
    }

    public SqlConnection GetConnection() => new SqlConnection(GetConnectionString());
}

The point is basically, if you are executing on Linux, find the unix socket but if you are stuck on Windows – try TCP.

Github Action

After having a single test – to my surprise – actually pass locally after having created the entire database – I thought it was time to think about the CI portion of this adventure. I had no idea if the Github Action thing would allow me to just pull down docker images, but I thought “probably not”. Still created the yaml, because nobody likes a coward:

# This is a basic workflow to help you get started with Actions

name: API Database tests

# Controls when the action will run. Triggers the workflow on push or pull request
# events but only for the master branch
on:
  push:
    branches: [ master ]
    paths: 
      - '.github/workflows/thisaction.yml'
      - 'test/API.DbUp.Tests/*'
      - 'src/API.DbUp/*'
      - 'src/API/*'
  pull_request:
    branches: [ master ]
    paths: 
      - '.github/workflows/thisaction.yml'
      - 'test/API.DbUp.Tests/*'
      - 'src/API.DbUp/*'
      - 'src/API/*'

# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
  # This workflow contains a single job called "test"
  test:
    # The type of runner that the job will run on
    runs-on: ubuntu-latest

    # Steps represent a sequence of tasks that will be executed as part of the job
    steps:
    # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
    - uses: actions/checkout@v2

    # Runs a single command using the runners shell
    - name: Run .NET Core CLI tests
      run: |
        echo Run tests based on docker. Bet u twenty quid this will fail
        dotnet test test/API.DbUp.Tests/API.DbUp.Tests.csproj

You can determine, based on the highlighted line above the level of surprise and elation I felt when after I committed and pushed, github chugged through, downloaded the mssql docker image, recreated my schema, ran the test and returned a success message. I am still in shock.

So what now?

Like Jeremy discusses in his post, the problem with database integration tests is that you want to get a lot of assertions out of each time you created your database due to how expensive it is. In order to do so, and to procrastinate a little, I created a nifty little piece of code to keep track of test data I create in each function, so that I can run tests independent of each other and clean up almost automatically using Stack<T>.

I created little helper functions that would create domain objects when setting up tests. Each test would at the beginning create a Stack<RevertAction> and pass it into each helper function while setting up the tests, and each helper function would push a new RevertAction($"DELETE FROM ThingA WHERE id = {IDofThingAIJustCreated}") onto that stack. At the end of each test, I would invoke the Revert extension method on the stack and pass it some context so that it can access the test database and output test logging if necessary.

public class RevertAction
{
    string _sqlCommandText;

    public RevertAction(string sqlCommandText)
    {
        _sqlCommandText = sqlCommandText;
    }

    public async Task Execute(IntegrationFixture fixture, ITestOutputHelper output)
    {
        using var conn = fixture.GetConnection();
        try
        {
            await conn.ExecuteAsync(_sqlCommandText);
        }
        catch(Exception ex)
        {
            output.WriteLine($"Revert action failed: {_sqlCommandText}");
            output.WriteLine($"Exception: {ex.Message}");
            output.WriteLine($"{ex.ToString()}");
            throw;
        }

    }
}

The revert method is extremely simple:

public static class StackExtensions
{
    public static async Task Revert(this Stack<RevertAction> actions, IntegrationFixture fixture, ITestOutputHelper output)
    {
        while (actions.Any())
        {
            var action = actions.Pop();
            await action.Execute(fixture, output);
        }
    }
}

So – that was it. The things I put in this blog post were the hardest for me to figure out, the rest is just a question of maintaining database integration tests, and that is very implementation specific, so I leave that up to you.

In the early nineties, things were changing everywhere. The wall had just come down and half of Europe were encountering freedom for the first time in their lives. Metallica had found mainstream success. Hair metal died and everybody was expecting the x86 architecture that was still struggling to transition to 32-bit to be crushed by workstation class RISC architectures. Even Windows NT was available on PowerPC and MIPS R4000.

My late father, who was basically a contrarian in a workplace dominated by Novell Netware at this point and always loved UNIX and would subject his people to use MS Word for Xenix, which basically had an identical user experience to MS Word for DOS at the time. Over serial terminal of course. He would eventually accept Linux as a substitute. He was so hyped at this point about all the heavy hitter RISC architectures that were already in use in workstations as well as those under development and was reading up on all of them in the pop science computer magazines that existed back in the day.

Just a reminder, RISC stands for Reduced Instruction Set Computer, which instead of having a complex instruction set with specialised instructions for edge case situations, it has a limited set of operations but more registers – i.e. the only “memory” the processor can readily access, the actual RAM is miles away from a CPU perspective – and a highly optimised instruction pipeline. Sometimes the architecture is called load/store describing that the “reduced” in Reduced Instruction Set refers to that the amount of work done per instruction is reduced – the instructions separate memory access from arithmetic, each instruction doing either – not both. CISC instructions on the other hand can have instructions that access memory multiple times. A complex instruction set computer therefore has a harder time optimising the instruction pipeline, which we can see with the speculative execution data leaks in Intel recent history. The benefits of a sane instruction set in terms of raw speed and cache efficiency were obvious even back then – the only real hurdle was backward compatibility. Would the players involved become one with the zeitgeist and embrace the change?

We all know what happened. Metallica went too far and cut their hair and released two mediocre albums in quick successions. Tragedy struck the Cobain household. Nu metal was created. Apple did make the transition to PowerPC in partnership with IBM and Motorola, but Intel had introduced the Pentium processor, bringing all the 32-bits to the backwards compatibility party.

While MIPS was already 64-bit, Intel had enormous problems evolving their platform. Ironically it was AMD that first invented a working 64-bit architecture on top of x86 that intel then could license for mainstream use, at the cost of settling their perennial legal battles.

After this, the instruction set architecture wars were settled, and everything ossified, Intel created the Core series and took the performance crown back from AMD. There were annual incremental improvement but Intel basically rested comfortably on their laurels just stacking them peas.

After the Skylake architecture things came to a head. The Skylake launch was fraught with bugs, and the ambitious roadmap of future architecture developments has not been delivered on. The current Intel desktop architecture is essentially Skylake +++ but the promised move to a new process node has not happened. Rumours say that this was why Apple MacBook Pro is struggling with low performance and overheating. The processors those laptops were designed for never came, and the old ones run way too hot to viably fit in such tiny packaging.

So – switch to AMD Ryzen mobile APUs then? Well – yes, that would have been my suggestion. But Apple again are not afraid to make drastic changes. After claiming strong performance in their iPad Pro chips they fancy themselves ready to give the middle finger to both Intel and AMD, and just go it alone and produce desktop class CPUs, based on the Acorn RISC Machine architecture. Sign up and pay the $500 and you will – if approved – receive a loaner ARM machine in a MacMini case with 16Gb of RAM and 512Gb of SSD storage so you can test your new ARM desktop Mac apps. I think my father would have been pleased.

Disparate Opinions

Various tidbits

Monthly Archives: June 2020

Database Integration Testing