Back in the day
You would have one database connection string, and that would be it. “The Database” would live in the darkest corner of the office and you would try not to interact with the physical machine that hosted the relational database management server. The Windows Forms app would open a connection and keep it alive forever with nothing to really threaten it. It would enable Multiple Active Result Sets because there was no grown-up way to query the database, queries were being shoved down the shared connection seemingly randomly depending on what the user was doing and what events forced data to be loaded. You would use the Windows Credentials of the end-user and specifically allow a the user to access the server, which fit with the licensing model where the user paid for accessing SQL Server when buying Office products, whilst pure server deployments needed per CPU core licensing which was hysterically expensive.
Cloud
The cloud revolution came and went, whilst we in the Microsoft / Windows / .NET bubble never really heard about it, until Microsoft decided to try their hand at hosting cloud servers with the Windows Azure project. I genuinely think this was extremely useful for Windows, as the prospect of hosting cloud services was so far beyond the capabilities of the bare OS, and forced some stark introspection and engineering effort to overcome some of the worst designs in the kernel.
For .NET Framework, a lot was too late already, and many misfeatures weren’t fixed until in the .NET Core reboot effort.
One of the things that did change for the average programmer was that database connections became ephemeral, and pooled. You could no longer hog one connection for all of your needs for all eternity – you created a new connection when you needed something and the framework would pool the connections for you, but you had to allow for connections taking longer to get established and also that they can be evicted at any point. Relatively quickly database providers would build in resilience so that you didn’t have to know or care, but in the early days even the happy path Microsoft marketing slides that usually never have error handling or security in them had to feature retry mechanisms – or else people simply couldn’t successfully replicate the early samples.
Security
As people started using the public cloud, eventually people figured out that security was a thing. Maybe we should not have overly powerful credentials lying around in config files on the webserver ready to be exploited by any nefarious visitors? People eventually started using things like Azure KeyVault or AWS Secrets Manager. Sure it’s bad if people break in and nick your stuff, but it’s worse if they also steal your car to carry away the loot.
Once people had all their secrets backed by hardened credentials services, features like autorotating credentials started becoming available. Since you are provided most of your authorisation by the general hosting environment anyway, and only really need credentials for systems that are incompatible with this approach, why don’t you just automatically periodically update the credentials?
Also – back in the day when the server lived back in the office, if somebody got a bit enthusiastic with the UPDATE statements and forgot to add a WHERE, you could send everybody out for coffee (read tea, dear UK readers) whilst the techiest person in the office would Restore to Point in Time and feel like a hero when the database stopped being in recovery mode and everything was back to the way it was before the Incident.
Today a “restore” means you get to create a new database that contains the world as you knew it, before the Incident. You then need to serially rename databases to achieve the effect of having restored the database. Not a big deal, just not what we used to call restore back in my day.
Back to retry mechanisms
For SQL Server running in Azure, you can now tell the Microsoft.Data.SqlClient to connect to the database using the Managed Identity for the app, meaning you are completely free of faff and the KeyVault is managing your access automatically.
With RDS on AWS you need to use legacy usernames and passwords unless you configure an Active Directory domain up there, but … no, but these credentials can be auto-rotated in AWS Secrets Manager. Because of the cost of talking to the Secrets Manager, it’s not something you want to do every request, as that piles up over a month.
One of those retry mechanisms from early Azure days start to make sense again, and is easily implemented as a factory method you can call instead of the await using var cn = new SqlConnection(...)
you have littered all throughout the code. I mean you’ll still await using that factory method, but it can do the setting up of the connection and validating the credentials, and only spend the dosh fetching the latest from the vault if you get the error code for invalid credentials. This means your bespoke factory method replaces both new SqlConnection
and OpenAsync()
.
Naïve retry mechanism, featuring goto
:
// https://docs.microsoft.com/en-us/sql/relational-databases/errors-events/mssqlserver-18456-database-engine-error?view=sql-server-ver15 // Error code that indicates invalid credentials, meaning we need to bite the bullet and fetch latest settings from Secrets Manager private const int INVALID_CREDS_ERROR = 18456; private const int INVALID_DB_NAME = 4060; public async Task EnsureValidCredentialsAsync(SqlConnection conn, CancellationToken cancellationToken) { var rdsCredential = GetRdsCredential(await _secretsCache.GetCachedSecretAsync(_siteConfiguration.DefaultDatabaseCredentialKey)); var dbCatalog = await _secretsCache.GetCachedSecretAsync(_siteConfiguration.DefaultCatalogKey); int reconnectCount = 0; reconnect: var connectionString = GetConnectionString(rdsCredential, dbCatalog); conn.ConnectionString = connectionString; try { await conn.OpenAsync(cancellationToken); conn.Close(); // restore to known state return; } catch (SqlException sqlEx) when (sqlEx.Number == INVALID_CREDS_ERROR) { // Credentials are incorrect, double check with secrets manager to see what's what - potentially creds have been rotated rdsCredential = await _secretsCache.GetSecretAsync<AwsSecretsManagerCredential>( _siteConfiguration.DefaultDatabaseCredentialKey ); } catch (SqlException sqlEx) when (sqlEx.Number == INVALID_DB_NAME) { // Database identifier is not valid, double check with secrets manager to see what's what (potentially restored db, deprecating old db name) dbCatalog = await _secretsCache.GetSecretAsync(_siteConfiguration.DefaultCatalogKey); } catch (SqlException sqlEx) { Log.Error(sqlEx, "Could not open default DB connection. Reattempting"); } if (reconnectCount++ < 3) goto reconnect; // surrounding code expects an exception if the open fails throw new InvalidOperationException("Could not open database connection"); }
What about Entity Framework?
Looking at the way you officially configure Entity Framework – it seems you can’t get away from having a known connection string up-front, which again isn’t a problem in Azure as discussed earlier, but for me, I want to only hand credentials to the app that I know have been tested.
In my past life inviting speakers to Oredev I once invited Julie Lerman to speak about Entity Framework and DDD, so I pretend that I know her and that I can call upon her for help in matters of programming, so I sent out a tweet linking to a Hail Mary Stack Overflow question I had created where I asked how I would be able to dynamically handle connection retries in a similar way or at least be able to call out and ask for credentials.
Surprisingly she had time to reply and pointed me to a video where she had addressed this subject and which taught me about something called DbConnectionInterceptors that had all the things I wanted, in addition this also introduced me to the super elegant solution natively supported by Microsoft.Data.SqlClient
for handling this situation in Azure that I mentioned earlier.
Basically I therefore created a class that inherits from DbConnectionInterceptor
and overrides two methods, one sync and one async version and call something like the above function to test the connection EF Core is about to open.
public override async ValueTask<InterceptionResult> ConnectionOpeningAsync(DbConnection connection, ConnectionEventData eventData, InterceptionResult result, CancellationToken cancellationToken = new CancellationToken()) { var sqlConnection = (SqlConnection) connection; // try opening the connection, if it doesn't work - update its params - close it before returning to achieve same state await _dbConnectionFactory.EnsureValidCredentialsAsync(sqlConnection, cancellationToken); return await base.ConnectionOpeningAsync(connection, eventData, result, cancellationToken); }
Of course – registering this interceptor is easy as well, after I had googled some more. There is a (synchronous) override that allows you access to an instantiated IServiceProvider as follows:
services.AddSingleton<ResilientRdsConnectionInterceptor>(); services.AddDbContext<ADbContext>((provider, options) => { var dbConnectionFactory = provider.GetRequiredService<DbConnectionFactory>(); var connectionString = dbConnectionFactory.GetDefaultConnectionString().GetAwaiter().GetResult(); // So sorry, but there is no async override options.AddInterceptors( provider.GetRequiredService<ResilientRdsConnectionInterceptor>()); options.UseSqlServer(connectionString); });
An aside on async/sync. It seems Microsoft will have you rewrite every other line of your codebase to properly support async or your app will fall over. But when you then want to reuse your async code and not have to duplicate everything into similar-but-not-reusable sync and async versions, the tiny changes needed for MS to support async everywhere are all of a sudden “not needed”, line in configuration or DI. It’s laziness, I tell you.
Anyway I have integration tests in place that verify that the mechanism for calling out for updated credentials actually works. Provided the credentials in the secrets manager/key vault actually work, this works like a charm.