Async Enumerable in C# / .NET 6

Background

In recent times Microsoft have begun to performance test their web platforms. Whilst previous generations of their .NET framework and ASP.NET web platform had prioritised ease of development over performance quite dramatically, the latest generation ASP.NET Core performs quite well, on Linux no less.

After inventing the async/await model of abstracting away callback hell when writing asynchronous code, the New Microsoft, the Ones That Care About Performance realised that people will just allocate all the RAM in the universe if you let them, and that whilst engaging in the now very common practice of using ASP.NET Core to create web APIs that produce data as json payloads, users would mercilessly just serialise massive payloads of List<T> into one massive string that they would shove out onto the network, or have server endpoints that would accept arbitrarily large strings off of the Internet attempting to coerce into a List<T>, meaning ASP.NET Core services could be knocked offline by supplying a ludicrously large payload, and performance could be a bit erratic at times, depending in the size of data the user was requesting.

So what do they do? Well, a couple of things, but one of them is to introduce the concept of IAsyncEnumerable<T>, an asynchronous enumerable, that supports cancellation, clean exception handling and stable performance for handling variably sized payloads without suffering unpredictable performance impact.

The goal today is to successfully serve a payload in ASP.NET Core 6.0, and to deserialise it in a client application, also in .NET 6, serialising onto streams, deserialising off of streams, processing data without allocating massive payloads, also – beginning to receive data right away rather than to wait before the full payload has been buffered in its entirety in various services along the way before eventually reaching the end user.

Physics and leaky abstractions

Just to preface this – just like the async /await doesn’t fundamentally change physics, i e there is no getting away from the fact that you first kick off an operation and basically schedule code to be run when that operation has finished, leaving you to do other things. I.e. since your code will actually return to the caller directly after you’re scheduled the first async operation, the code has to return something in addition to the normal return value, it has to return a handle through which you can access the state of the function, and – once the operation has completed – the return value. This way the surrounding code has a chance to deal with the asynchrony but most of the time just pretend that the code is synchronous.

You see, if the human squishy brain cannot fathom mulithreading, don’t let me get started on asynchrony.

So with a normal asynchonous function that returns a scalar, the caller receives a System.Threading.Task that encalsulates the asynchonous state and eventually the return value. The async keyword lets you pretend it isn’t there and write synchronous code , as long as you put an await in before the asynchronous call is made.

Contagion

You’ll notice though, like with monads, that when you’ve started wrapping your return values in Task<T>, it’ll go all the way across to the other side of the application, i e if your database code is asynchronous, the repository or other database access layer topology you have will be asynchronous too, and then you turn around and then you find that it has spread all the way to your ASP.NET controller. On the plus side, the ASP.NET controller automagically injects a CancellationToken that you can send all the way down to the database and get automagic cancellation support for long running queries if people refresh their page, but that’s an aside.

The point here is the contagion. You can attempt to force things to be different with GetAwaiter().GetResult() to block a thread while it’s evaluating, but that is very dangerous performance-wise, better to just let it spread, except for in places where Microsoft have been lazy, such as in Validation and Configuration, cause clearly when it would mean work for them it’s “not necessary” but when it’s eons of work for us they are fine with it. Our time is free for them.

Anyway, I mean it makes sense that the abstraction must leak in some cases, and IAsyncEnumerable is no different. Any one return value would fly in the face of the whole streaming thing. So awaiting a task doesn’t really make sense. Instead it’s iterators all the way down. Everywhere. Each level yield returns to the next, all the way down the chain.

Dapper allegedly comes with support for IAsyncEnumerable, but at the time of writing there is zero documentation supporting that allegation.

You can simulate that by writing this bit of code:

    public static async IAsyncEnumerable<T> QueryIncrementally<T>(this SqlConnection conn, CommandDefinition commandDefinition, CommandBehavior behaviour = CommandBehavior.CloseConnection)
    {
        await using var reader = await conn.ExecuteReaderAsync(commandDefinition, behaviour);
        var rowParser = reader.GetRowParser<T>();

        while (await reader.ReadAsync())
        {
            yield return rowParser(reader);
        }
    }

From that you can then pass the payload up iterator style, yield returning all the way up, until you get to the controller where you can declare the controller to return IAsyncEnumerable and the framework will handle it correctly.

Obviously as you cross the network boundary you have a choice in how to proceed, do you want to receive the data incrementally as well, or do you want to wait for all of it to arrive?

Since you made such a fuss in the first API, we will assume you want the consuming side to be as much work.

    private static async Task<Stream> GetStream(HttpClient client, string endpoint)
    {
        var response = await client.GetAsync(endpoint, HttpCompletionOption.ResponseHeadersRead);
        var responseStream = await response.Content.ReadAsStreamAsync();
        return responseStream;
    }

    public static async IAsyncEnumerable<T> HandleGetIncremental<T>(this HttpClient client, string endpoint)
    {
        var stream = await GetStream(client, endpoint);
        var items = JsonSerializer.DeserializeAsyncEnumerable<T>(stream, CreateSerializerOptions());
        await foreach (var item in items)
            yield return item;
    }

And then, of course, you yield return all the way up to the next network boundary.

Is this ready for prime time? Well, in the sense that Jay Leno was ready for prime time when he ceded the Tonight Show to Conan O’Brien, but everybody would probably like some more pace and less awkwardness.
Apparently letting lambdas yield return is on its way, and hopefully that can make it easier to pipe an IAsyncEnumerable through one API to the next, easily adding some filter or transformation mid flight rather than the incessant await foreaching that is now necessary.

Leave a Reply

Your email address will not be published. Required fields are marked *