Busman’s Holiday

July 20, 2025Architecture, BlogAwesomeRikard Ottosson

Rusted bolts vs a pristine manifold

I used to spend time with automotively inclined gentlemen. There were two distinct schools of the car hobby at that time. Finbilsmek, e.g. renovating a classic car or preparing a race car – sure, it eats all your money in parts, but you get to listen to music and carefully admire your new components as you fit them to your clean project car – unless it’s the weekend before the race where the stress level is high. The other school is bruksbilsmek, i.e. fixing your daily driver. It is the night before the MOT, it’s by the side of the road, it’s with a subset of your tools on a Halfords parking. Only if you are lucky does it takes place in your garage, on a car lift – and even if you are that lucky, then salt and grime is constantly falling in your face and if you fail to sort the problem it will have a massive impact on your daily life.

A similar thing exists in IT. If you are tinkering with your computer at home you have time to google bits, listen to music, type random stuff and see if it works. Worst case you just wipe it and start over. It’s enjoyable to install some weird hardware or software and try to get it going.

However if your work laptop starts having problems, or a thing that you need to sort out for work is broken, the enjoyment goes away and there is only rage. Therefore, at least at my age, I wouldn’t build a computer for work, nor do I have any wish to maintain the operating system or mess with networking or access rights – there are pros that do that stuff and keep abreast of all the bulletins of which security holes out there have been patched, I happily let them worry about it, I just accept their vetted upgrades and make sure I restart when I’m asked to.

Baby-proofing a laptop

This is why I’m not principally against working in a baby-proofed environment, i.e. where you as a developer do not have true admin rights, you have no access to customer data, you have no direct access to production. I would love that – as long as that still meant I could install everything I need to work, I can test all my logic locally (code, deployment, monitoring – all of it) and that all my developer tools work. Having all the networking, patching of servers, provisioning of resources and testing patches all of that being magically taken care of by someone else is very nice indeed, and allowing me to focus on delivering trustworthy code which I’m sure sounds super boring to others.

Unfortunately achieving such an environment – a baby-proofed one – requires a lot of engineering. We would like to be in a situation where a company onboards someone and without any manual intervention whatsoever they get a user account provisioned with all the correct group memberships and access as well, and after plugging in the laptop, setting up MFA, locking the screen and going for coffee – all necessary apps will be installed onto the laptop ready for immediate productivity. That would require a lot of cooperation between HR, ops, dev and procurement, plus enough resources to implement and tests all aspects iof this, and everyone involved in this would already have a day job so this would be extra.

Root of all evil

The biggest technical obstacle that makes developer special is that developers use software that need to attach a debugger to a process, and to open ports, i.e. listen for incoming traffic/ requests. – which is what a web app is. An operating system thinks these are dangerous things. Generally you get to listen to stuff on some ports with high numbers, but “well-known” ports require admin access. I.e. you can’t open port 80 and 443 without admin access, cause it would be dangerous if some random code tried to play web server. Attaching a debugger is even more dangerous, you literally have access to all of the process’ memory. You could read any secrets you wanted out of there, so – yeah – not something you get to do without admin access. Opening ports on high numbers was not a problem at the time, but in some cases you still needed to attach a debugger to IIS which required admin access.

On unix-like operating systems that were multi user aware from the beginning, there has been a culture of creating your own user for day-to-day work, and keeping an admin account called root that you only use for things that the operating system thinks is serious, like writing to the /etc directory or running programs in /sbin. Later the concept of sudo arrived, where you basically give accounts the opportunity to temporarily acquire root privileges after typing in their own password again, meaning you can delegate the right to install software without permanently giving the user elevated rights or giving them a root password. Also, the need to type in the password makes it harder to abuse by trickery, but by no means is it bullet proof.

Windows came from DOS, a single user operating system. Although Windows NT, the kernel has decent security design, the culture among windows users was generally that you just put yourself in the Administrators group when you installed your computer and you were “root” and life was easy. The lax security culture meant that many apps simply could not function if the user was not part of the Administrators group, so there was evidently no practical adoption of healthy practices. Windows machines were extremely susceptible to malware and as popular as Windows XP was, something had to be done. When Windows Vista came, the most hated new feature was User Access Control, which was a new layer of obstinance on top of Windows security, meaning the operating system threw up a popup in your face when you did something risky – like opening any port at any number, writing files to suspicious folder – such as editing C:\Windows\System32\drivers\etc\hosts – which is the windows version of /etc/hosts.

People hated UAC, and it was the new thing people did directly after installing – add yourself to Administrators and switch off UAC. But unfortunately you couldn’t argue with the results. The spread of malware was slowed down quite dramatically. Effectively UAC was a bolt-on sudo copy that just made you click on something to confirm. If you didn’t have access rights, it would ask you to type in some credentials that did have the power to approve the action. This meant that corporations started to give you a separate admin accounts that only worked on your machine, but gave you enough rights to open ports or install programs. An analogue to sudo, but more cumbersome.

Windows 7 made UAC back off a bit to increase adoption, and the results continued to be impressive. However – although Microsoft built a simple web server for development – IIS Express – that didn’t require administrative access when debugging – UAC would still sometimes ask you for approval to start things like an android emulator, an Azure Storage emulator or even an Azure Function Host, thus still requiring users to have some way of elevating, i.e. type in admin credentials just to do work. This has to be addressed if we are to be able to move into the glorious future where developers are fully embedded in a padded cell where we can do no harm.

Forbidden knowledge

At Netflix among other places, they devised a way to provide an ether of configuration that apps can just absorb, meaning that the app announces who it is, and recieves its configuration, i.e. you remove the problem of needing to know how the production environment is set up, you just ask for things and they are provided. That way apps can be secured and configured without any knowledge of the production environment leaking out to developers.

Containerisation lets us effectively ship a little egg of code into production, with a defined contract of what the application needs from the outside world. Combine this with a sidecar as above that handles communication between services, and you achieve the perfect state of developers being safely prevented from knowing anything concrete about how the production environment is configured, yet being able to deliver tested apps into production.

The biggest obstacle here is leaky abstractions. Like DAPR for instance promises to abstract away how things like message queues work, but it doesn’t actually. You cannot locally test something with Redis Message Broker or RabbitMQ that you intend to run on Azure Service Bus in prod. You need to be able to integration test automatically, or else it is unacceptable. The tests need to be able to run realistically in every environment.

Let me VNC onto the server

Back in the day when VMs were commonly used when hosting websites, you sometimes had to log into a virtual server and look into eventvwr.exe to see what was actively going wrong, maybe a particular executable was eating all the memory and needed a bit of encouragement to get over itself. This type of access is of course dangerous to have, and it would be nearly unheardof for a developer to have this type of access to production hardware even when troubleshooting, and instead there will be alerts that automatically destroy an instance of an app that is misbehaving whilst already having spun up a replacement. In the rare cases you still need to use a VM , you install agents on them that allow people to perform certain maintenance tasks without ever logging in. Fundamentally this has been solved in the way I foresee all of this being solved, by abstracting away the problem.

Conclusion

We are closer than ever to utopia, and the level of hand cranking required to reach nirvana is lower than ever, but there is still too much manual effort required. There is plenty of scope for disruption. A cocoon world for developers that allows for low faff developing and testing of containerised apps, being able to conclusively prove that monitoring and dependency acquisition works locally before pushing the code to CI is a minimum. This, depending on your cloud provider is still anything from impossible to a massive PITA. There are according to a quick search new IAM solutions that look like they offer identity and app provisioning in a seamless way, so the future is on its way somehow.

Development Productivity

June 28, 2025Blog, DevOps, WafflingRikard Ottosson

Why the rush to measure developer productivity?

A lot has been written on developer productivity, the best – indubitably – by Gregerly Orosz and Kent Beck, but instead of methodical thinking and analysis I will just recall some anecdotes, it is Saturday after all.

The reason a business even invests in custom software development is that the business realises they could get a competitive advantage by automating or simplifying tasks used when generating revenue, and I would say only as a secondary concern to lower cost.

The main goal is to get the features and have some rapid response if there are any problems, as long as this happens there are no problems. Unfortunately as we know, software development is notorious for having problems.

The customer part of the organisation is measured relentlessly, that is sometimes incentivised based on generated profit, whilst they deal with the supplier part of the organisation that seems to have no benchmarks for individual performance, which is bewildering when projects keep slipping.

Why can’t we just measure outcome of an individual software developer? Because an individual developer’s outcome depends on too many external factors. Sure, one developer might type the line that once in production led to a massive reduction in losses, but who should get credit? The developer that committed the code, the other developer that reviewed it? The ops person that approved the release into production? The product owner that distilled wants and needs into requirements that could be translated into code?

The traditional solution is to attempt to measure effort, velocity, lines of code et cetera. Those metrics can easily prove to yield problematic emergent behaviour in teams, and it at no point measures if the code written actually provides value to the business.

I would argue that the smallest unit of accountability is the development team. The motley crew of experts in various fields that work together to extract business requirements and convert it into running software. If they are empowered to talk to the business and trusted to release software responsibly into production, they an be held accountable for things like cloud spend, speed of delivery and reliability.

Unfortunately the above description of a development team and its empowerment sounds like a fairytale for many developers out there. There are decision gates, shared infrastructure, shared legacy code and domain complexities meaning that teams wait for each other, or wait for a specific specialist team to do some work only they are trained/allowed to do. I have likened it to an incandescent lightbulb before. A lot of heat loss, very little light. Most of the development effort is waste heat.

Why do software development teams have problem delivering features?

I will engage in harmful stereotype here to make a wider point. I have been around the block over the years and visited many organisations that have degrees of the below issues, in different ways.

Getting new stuff into the pipeline

Fundamentally it is difficult to get access to have the internal IT department build you a new thing. A new thing means a project plan has to be drawn up to figure out how much you are willing to spend, the IT department need to negotiate priorities with other things currently going on and there can be power plays between various stakeholders and some larger projects IT are engaging in on their own because they mentioned it when their own budget was negotiated.

Some of this stuff is performative, literally a project needs to look “cool” to justify an increase in headcount. Now, I’m not saying upper management are careless, there are follow-ups and metrics on the department, and if you get a bunch of people in and the projected outcomes don’t materialise, you will be questioned, but the accountability does not change the fact that there is a marketing aspect when asking for a budget, which also means there is some work the IT department must do regardless of what the rest of the business wants, because they promised the CEO a specific shiny thing.

Gathering requirements

In some organisations, the business know “their” developers and can shoot questions via DM. There is a dark side to this when one developer becomes someone’s “guy” and his Teams in tray becomes a shadow support ticket system. This deterioration is why delivery managers or scrum masters or engineering managers step in to protect the developers – and thus their timelines – because otherwise none of the project work gets done because the developers are working on pet projects for random people in the business and all that budgeting and all those plans go out the window. The problem with this protectionism is that you remove direct feedback, the developers do not intuitively know how their users operate the software in anger. So many developers get anxiety pangs when they finally see the amount of workarounds people do on a daily basis with hotkeys or various clipboard tricks, things that could just have been an event in the source code, or a few function calls, saving literally thousands of hours of employee time in a year.

The funnel of requirements therefore goes through specific people, a product owner or business analyst that has the unenviable task to gather all kinds of requests into a slew of work that a representative in the business is authorised to approve, meaning instead of letting every Tom, Dick and Harry have a go at adding requirements, there is some control to prevent cost runaway and to offer a cohesive vision. Yes, great advantages, but one thing it means is that people on the front line that work against commission and feel a constant pressure whilst battling the custom software, when they complain about stuff being annoying or difficult, they never see that being addressed, and when every six months some new feature is presented, they notice that their pet peeve is still there unaddressed. This creates a groundswell of dissatisfaction, sometimes unfounded, but unfortunately sometimes not.

Sometimes the IT department introduce a dual pipeline, one pipe for long-term feature work and one for small changes that can be addressed “in the run of play” to offer people the sense of feedback being quickly addressed, which adds the burden of having a separate control mechanism of “does this change make sense? is it small enough to qualify?” but some companies have had success with it.

The way to be effective here is to reduce gate keeping but have transparent discussions on how much time can be spent addressing annoyances. Allowing teams to submit and vote for features works too, but generally just showing developers how the software is used by its users is eye-opening.

Building the right thing

“We have poor requirements” is the main complaint developers have when stories overrun their estimates. In my experience this happens when stories are too big, and maybe some more back and forth is needed with the business to sort it out. If developers and business are organisationally close enough, a half hour meeting could save a lot of time. It can be argued that estimates are a waste of time and should be replaced by budgets instead, but that’s a separate blog post.

Developers have all had experience of wasting time. Let us say request comes down from the business with a vision of something, the team goes back and forth to figure out how to migrate from an existing system and how to put the new app in production. They write a first version to prove the concept, and then it never gets put into production for some external reason. Probably a perfectly valid reason, it is just never made clear to the developers why the last three months of meetings, documents and code was binned, which grates on the developers. As implied above, the fact that it was 3 months of work can explain why thing was never released, its time may very well have passed.

My proposed solution to the business remains to build smaller initial deliverables to test the waters, I have yet to be convinced that doesn’t work. It is hard, yes, it requires different discussions, and I will concede, the IT department might already be at a disadvantage trying to promise faster deliverables.

Also, requirements change because reality changes, the business is not always just messing with you, and because your big or complex changes take time to get through to production, the problem of changing requirements gets exacerbated the slower delivery is. Also – don’t design everything up front. Figure out small chunks you can deliver and show people. This is difficult when you are building backend rails for something, but you can demo API payloads., Even semi- or non technical people can understand fundamental concepts and spot omissions early “why are you asking for x here? We don’t know that at this point of the customer journey”. Naming your API request and response objects the way the business would, i.e. ubiquitous langauge, makes this process a lot easier. Get feedback early. Keep a decision log. You don’t need diagrams of every single class but you do need a glossary and a decision log.

Building the thing right

I have banged on about engineering practices on here before, so there is nothing really new here. Fundamentally, the main things are missing tests, anything from unit, to feature, to integration, to contract – not to mention performance tests. Now, sometimes. you write these tests and run them automatically. Ideally you do, but with contract tests for instance, the amount of ceremony you have to set up to do your first automated contract test, plus its limited value until everything is contract tested means that a fair trade off can be to agree to manually test contracts. The point is, you will have there tests regardless of if they are automated or not, or else you will release code that does not work. The later you have the epiphany that test first is superior, the more legacy you will have that is hard to test and hard to change.

Even if you are stuck with legacy code that is hard to test, you can always test manually. I prefer to have test specialists on a team that I can ask for help, because their devious minds can come up with better tests, and then I just perform that tests when I think I’m done. You hate manually testing? Good, good! Use the hate, automate, but there is really no reason to not test.

If there are other parts of the business calling an API you are changing, never break an interface, always version. Doesn’t matter how you try and communicate, people are busy with their own stuff, they will notice you broke them way too late. Of course, contract tests should catch this, but why tempt fate. Cross-team collaboration is hard, if you can put some of these contracts behind some form of validation, you will save a lot of heartache.

Operation

I have addressed in previous posts the olden day triumvirate of QA, Ops and Dev and how they were opposing forces. Ops never wanted to make any changes, QA would have preferred if you made smaller changes, and Devs just churned out code throwing it over the wall. Recently it is not as bad, the DevOps culture attempts to build unity and decentralise so that teams are able to responsibly operate their own software, but a lot of time there is a specific organisation that handles deployments separately from the development team. Partly it can be interpreted as being required by ITIL, but also it gives operations a final chance to protect the business from poor developer output, but with all gatekeepers and steps, it adds a column on the board where tickets gather, which makes for a bigger release when it finally hits production, a bigger changeset means a bigger surface area and more problems.

The key problem with running a service is to understand its state and alert on it. If it takes you a few hours to know that a service isn’t performing well, you are at a disadvantage. There is a tradeoff between the amount of data you produce and what value it brings.

Once you can quickly detect that a release went bad and can quickly roll it back, ideally automatically, then you will have saved everyone a lot of time and improved the execution speed for the whole department. It is very important. If not, you will further alienate your users and their managers, which is even worse, politically.

Technical advancements

People may argue that I have been telling you that lacking development productivity is the fault of basically everyone else but developer, but…. come on – surely we will be able to just sprinkle some AI on this and be done with it? Or use one of those fancy low code solutions? Surely we can avoid the problem of producing software slowly by not actually writing code?

The only line of code guaranteed to be bug free is the one not written. I am all for focusing on your core business and write only the code you need to solve the business problem at hand. Less code to write sounds like less code to review and maintain.

Now, I won’t speak to all low code solutions because I tend to work in high code(?) environments the last decade and a bit, but the ones I have seen glimpts of look very powerful, slap a text box on a canvas, bosh you have a field stored in a table. The people writing applications with these platforms will become very skilled at producing these applications quickly,

Will all your software live on this platform? What happened to the legacy apps that slow you down today, will they not require some changes in the future as well? Will the teams you currently have problems avoiding to break shared APIs fare better with strings in the textbox of a website? Are you responsible for hosting this platform or is it SaaS, how is data stored? Will the BI team try to break into a third party database to ingest data or will they accept some kind of API? What about recently written micro services that have yet to pay back their cost of development? Bin them?

Conclusion

My belief is that the quest for developer productivity comes from a desire to reduce the time between idea and code running well in production. If that lead time was drastically cut, nobody on the other side of the business is going to care what developers do with their time.

Although development productivity is affected by the care and attention applied by individual developers, given the complexities of a software development department and the constraints of the development workflow, productivity is a function of the execution of the whole department. If your teams are cross functional and autonomous you can hold them accountable on a team level, but that requires a relative transparency around cost that requires engineering effort to acquire.

The speed with which features come to production is only in a limited way affected by the speed with which developers write code, and if you do not address political and organisational bottlenecks, you may not see any improvements if you go low code or AI (“vibe coding”),

Our job as older people within a software development function is to make sure we do what we can to make sure features reach the business faster, regardless of how they are implemented. As always measure first and optimise at the bottleneck before you measure again.

Assembly and abstractions

June 6, 2025Blog, CI, GeneralRikard Ottosson

Machine Code and Assembly

Up until the 1980s, when you bought a computer, you occasionally needed to program it using machine code, i.e. directly feeding the processor instructions in its native instruction set. Programming language abstractions had existed since the 1960s, but for small computers, the overhead of compilers and lack of optimisation made it unfeasible for certain applications. Leading mathematicians and computer scientists designed languages like Algol 68 that influenced many of the languages still around today, but everyone knew that if you wanted to build something truly performant – on simple hardware or applications that crunch a lot of data – you would have to write it in assembly (basically machine code, but with names instead of instruction numbers, and labels for branches – a very transparent abstraction over machine code). A compiler would not always create optimal machine code based on the higher level language, but eventually the time saved reading and writing a high level language versus maintaining large codebases in Assembly outweighed more and more minute performance differences.

High level languages and new categories of work

As computer programs got bigger, a couple of problems commonly arose. Shared mutable state turned out to be risky, as it becomes hard to understand which code changes what. Code that affect a certain concept should ideally live next to the rest of the code that affects that area so that it becomes easier to understand. Naming becomes critical for maintenance et c. Structured programming was introduced, object oriented programming, declarative programs et cetera.

When the declarative structured query language (SQL) was invented by Chamberlin and Boyce at IBM, it was a domain specific language for data structuring, management, retrieval. The idea was that you no longer needed software developers to query and update data. However, nobody could be bothered with that. Sure, a lot of people made little apps in Microsoft Access, but largely what happened was that a brand new specialist group of IT folks emerged, the DBOs, that understood all the complexities around storage architecture, query design, troubleshooting, and optimisation.

JavaScript-as-assembly

JavaScript became the lingua franca of the web, and babel allowed people to be creative. As a simple example, you can write scripts for the web in F# using Fable, and when you view source in the browser it shows the F# source code, however in reality the code being run is javascript that has been transpiled and optimised using tooling, whilst the browser knows how to locate the original to show in the developer tools. This means that you can essentially run any language you like and essentially compile it down to javascript. You commit your original source file into your source code versioning system, and rely on build pipelines to generate the javascript the same way you don’t commit binaries to source control but build them in the pipeline.

What about AI?

Just like with SQL, a lot of people are convinced that we can stop hiring developers now that we have all these sophisticated programmer AIs that can do all the work. I am selfishly not convinced that is true. Maybe it is because of all the other times they were going to get rid of programmers that either fizzled out (Rational Rose) or just turned out to create new jobs, like the SQL example above.

One problem is that abstractions are leaky, i.e. you cannot fully escape the limitations of the underlying platform, meaning that you will need to know a few fundamentals to manage a software system. Like – you may not need to be able to adjust a carburettor in order to drive a car anymore, but you should probably know about fuel, air and spark to be able to troubleshoot if something is wrong.

Another potential problem is that senior developers are not spawned fully formed like Pallas Athena, they emerge out of the chrysalis of a junior developer. Good judgement comes from experience, experience comes from bad decision et c. For senior developers to be around to wrangle AI agents, we somehow need to employ juniors first.

I think there will be some turmoil in the industry for a while, but eventually we will settle down on a new source code emerging, probably a rather onerous specification, written in English by one or more AI agents, and then there will be a build pipeline, that with determinism can generate the software from the specification. The level of specificity required probably makes the specification unmanageable to write by a human, but it can at least be read, reviewed and understood. Once determinism exists , the spec will be the source code, and the traditional source code will be like the javascript that is used in the browser today, you most likely don’t even have to look at it, unless something mysterious has gone wrong, and it definitely won’t need committing to your versioning tool.

Death of Enterprise C# as we know it?

April 12, 2025BlogRikard Ottosson

The constant battle between getting paid and getting a wide audience has simultaneously hit a number of formerly open source products in the .NET space as they move towards closed source paid licensing in future versions. Is this the end of the world for current shape enterprise C# as we know it? If not – will we be OK?

Enterprise applications retrospective

I will tell this historical recapitulation as if it is all fact, but of course these are just my recollections at this point, and I can’t be bothered to verify anything at the moment.

When I got started in the dark ages, developing software using Microsoft technologies was not something real developers did unless forced. Microsoft developers used VB, and were primarily junior office workers that had graduated from writing Excel macros or Access forms apps. The pros used C++. Or, more truthfully C/C++ – as people mostly wrote C with the occasional class sprinkled in. The obvious downside with the C part is that humans are not diligent enough to handle manual memory management, and C++ had yet to develop all the fancy memory safety it has now.

The solution came from Sun Microsystems. They invented Java, a language that was supposed to solve everything. It offered managed memory, i.e. it took much of the responsibility for memory management away from the developer. It also did not compile down to machine code directly, it compiled down to an intermediate language that could then quickly be interpreted into machine language through a runtime. This abstraction layer made it possible to write Java once and run it on any platform. This was attractive to many vendors of complex software such as database engines, as they wanted to compete on the Workstation market, i.e. Serious Computer Hardware for engineers and others, and all of a sudden being able to sell the same software onto multiple hardware platforms in that space was attractive, since by nature those platforms were never going to be very numerous.

This was an outrageous success. Companies adopted Java immediately, there were bifurcations of the market, open source Java Development Kids came about. Oracle got in the fray. C++ stopped being the default language for professional software development, and as a JVM based ex-colleague of mine remarked “it became the Cobol of our time”.

Microsoft saw this, and wrote a java interpreter for DOS/Windows (J++), but of course they used the embrace-extend-extinguish playbook and ran fast and loose with the specification. Sun knew what was coming, so they immediately rounded up their lawyers.

Hurt and rejected, Microsoft backed off from joining the JVM family, but instead hired Turbo Pascal inventor Anders Heijlsberg to create a new language that would be a bit more grown up than VB but also be more friendly to beginners than C++ was. Basically, the brief was to rip off Java, which is evident if you look at the .NET Base Class Library today.

Now, the reason for this retrospective is to explain cultural context. Windows having come from DOS, a single user operating system – means that Windows application security was extremely deeply flawed both technically initially, but also culturally for the longest time. Everybody runs with administrator privileges the way they’d never daily as root on Linux, to the point they had to introduce an extra annoyance layer UAC on top of normal windows security, because even if they implemented an equivalent to sudo, there would be no way culturally to get people to stop granting their user membership in the Local Administrators group.

The same way Basica in DOS and VBA in Office fed people to VB that fed people to C#, always with the goal of low barrier to entry and beginner friendly documentation, has meant that there is an enormous volume of really poor engineering practice in the .NET developer space. If you google “ASP.NET C# login screen” you will have nightmares when you read the accepted answers.

Culture in the .NET world vs Java

In the Java space meanwhile, huge strides were made. Before .NET developers even knew what unit tests were, enterprise Java had created suites for tests, runners, continuous integration, object-relational mappers, all kinds of complex distributed systems that were the backbone of the biggest organisations on the internet. Not everything was a hit – java browser applets died a death fairly quickly, but while Java development houses were experiencing domain driven design and micro services, Microsoft’s documentation still taught beginners about three tier applications using BizTalk, WCF and Workflow Foundation.

Perhaps because of Oracle and Sun being too busy suing everybody, perhaps because of a better understanding of what an ecosystem is or perhaps just blind luck, a lot of products were created on the JVM and allowed to live on. Some even saw broad adoption elsewhere (Jenkins and Team City for build servers just to name a couple) while .NET really never grew out of the enterprise line of business crud. Microsoft developers read MSDN.com and possibly ventured to ComponentSource, but largely stayed away from the wider ecosystem. Just like in the days when you could never be fired for buying IBM, companies were reluctant to buy third party components. In the Windows or .NET world, only a few third party things ever really made it. Resharper and Red Gate SQL Tool Belt can sell licenses the way almost no-one else can, but with every new version of Visual Studio, Microsoft steals more features from ReSharper to include by default in a way that I don’t think would happen in the Java space. First off, there is no standard Java IDE, there are a few popular options, but there is no anointed main IDE that you must use (except, if I had to attempt java programming I’d use IntelliJ IDEA because brand loyalty, but that’s just me), while Microsoft definitely always has had an ironclad first party grip over its ecosystem.

Open Source seeping into .NET

There was an Alt.NET movement of contrarians that ported some popular Java libraries to .NET to try and build grown-up software on .NET, but before .NET Core it really seemed like nobody had ever had to consider performance when developing on .NET, and a slow death kept occurring. Sebastien Lambla created Open Wrap to try and provide a package manager for .NET developers. Enthusiasts within Microsoft created NuGet that basically stole the concept. German enthusiasts created Paket in order to more successfully deal with dependency graphs, there was a lot of animosity before finally Paket stopped being actively sabotaged by Microsoft. After Nuget saw broad adoption, and .NET developers heard about the previous decade worth of inventions in the Java space, there was an explosion of growth. Open Source got in below the radar and allowed .NET developers to taste the rainbow without going through procurement.

Microsoft developers heard about Dependency Injection. It was so much behind Java’s container wars that when Java developers joked about being caffeine driven machines that turn XML into stack traces, if we even used configuration files, they would most likely have been Json at the time. because XML had died out of the mainstream between them and us discovering overcomplicated DI container configuration.However, since we did have proper Generics – which Java did not until much later – we were able to use fluent interfaces to overcomplicate our DI instead of configuration files. This was another awkward phase – that we will get back to later.

We got to hear about unit tests, NUnit was a port of the JUnit test framework in Java, and Microsoft included MSTest with Visual Studio, but it was so awful that NUnit stayed strong. Out of spite, Microsoft pivoted to promote XUnit so that even if MS Test always remained a failure, at least NUnit would suffer. XUnit is good, so it has defeated the rest.

Meanwhile, Sun had been destroyed, Oracle had taken stewardship of Java, C# had eclipsed Java in terms of language design, and the hippest developers had abandoned Java for Ruby, Scala, Python and JavaScript(!). Kids were changing the world using Ruby on Rails and Node JS. Microsoft were stuck with an enormous cadre of enterprise LOB app developers, and blogs titled “I’m leaving .NET” were trending.

The Force Awakens

Microsoft had internal problems with discipline, and some of the rogue agents that created NuGet proceeded to cause further problems, and eventually after decades of battles with the legal department they managed to release some code as open source within Microsoft, which was a massive cultural shift.

These rogue agents at Microsoft had seen Ruby on Rails and community efforts like Fubu MVC and became determined to retrofit a model view controller paradigm onto legacy ASP.NETs page rendering model, and the improvement with Razor views over old ASPX pages was so great that adoption was immediate, and this was in practice the first thing that was made open source, the ASP.NET MVC bit.

The ruby web framework Sinatra spawned a new .NET thing called Nancy FX that offered extremely light-weight web applications. The creator Andreas Håkansson contributed to standardising a .NET content pipeline called OWIN, but later developments would see Microsoft step away from this standard.

In this climate, although I am hazy about the exact reasons why, some troublemakers at Microsoft decided to build a cross platform new implementation of .NET, called .NET Core. Its first killer app was ASP.NET Core, a new high performance web framework that was supposed to steal the lunch from Node JS. It featured native support for middleware pipelines and would eventually move towards endpoint routing, which would have consequences for community efforts.

After an enormous investment in the new frameworks, and with contributions from the general public, .NET Core became good enough to use in anger, and it was a lot faster than anything Microsoft had ever built before. Additionally, you could now run your websites on Linux and save a fortune. Microsoft were just as happy to sell you compute on Azure to run Linux. It felt like a new world. While on the legacy .NET Framework the Container Wars had settled into trench warfare and a stalemate between Castle Windsor, Ninject and Unity among others, Microsoft created peace in our time by simply building a rudimentary DI system into .NET Core, de facto killing off one of the most successful open source ecosystems in .NET.

A few years into ASP.NET Core being a thing, the rogue agents within Microsoft would introduce minimal APIs, which thanks to middleware and endpoint routing basically struck a blow to Nancy FX and also blocked F# web service project Giraffe for about a year in the hope that it too would support OpenAPI /Swagger API documentation.

After we finally caught up on SOLID that Java developers had been talking about for a decade, we knew we had to separate our concerns and use all the patterns in the Gang of Four. Thankfully MediatR came about, so we could separate out our web endpoints from the code that actually did the thing, by mapping the request object to a DTO, and then just passing that request DTO to MediatR that would pick and execute the correct handler. Nice. The mapping would most commonly be done by using AutoMapper. Both of these are both loved and hated.

Before DevOps had become trendy, developers in the .NET space were using Team City to run tests on code merged into source control, and would have the build server produce a deployable package upon all tests being green, and when it was time to release, an ops person would either deploy it to production directly or approve access to a physical machine where the developer would carefully deploy the new software. If you had a particularly Microsoft-loyal CTO, you would at this time be running TFS as a build server, and use Visual Studio’s built in task management to track issues in TFS, but most firms had a more relaxed environment with Team City and Jira.

When the DevOps revolution came, Octopus Deploy would allow complex deployment automation directly out of Team City, enabling IT departments to do Continuous Deployment if they chose to. For us fortunate enough to only click buttons in Octopus Deploy it felt like the future, but the complexities of keeping those things going may have been vast.

I’ve looked at Cloud, from both sides now

Microsoft Azure has gone through a bunch of iterations. Initially, the goal was to try and compete with Amazon Web Services. Microsoft offered virtual machines and a few abstractions, like Web Workers and Block Storage. Oh, and it was called Windows Azure to begin with.

A Danish startup called AppHarbor offered a .NET version of Heroku, i.e. a cloud application platform for .NET. This too felt like the future, and AppHarbor moved to the west coast of the USA and got funding.

Microsoft realised that this was a brilliant idea and created Azure Websites, offering PaaS within Azure. This was a shot below the waterline of AppHarbor that finally shut down in 5 December 2022. After several iterations, this is now knows as Azure App Services, and is more capable than AWS Lambda, which in turn is superior to Azure Functions.

Fundamentally, Azure was not great in the beginning. It exposed several shortcomings within Windows, it was basically the first time anybody had attempted to use Windows in anger, so Microsoft were shocked to discover all the problems. After a tremendous investment in Windows Server, as well as to a certain extent giving up and running stuff like software defined networking on Linux, Azure performance has improved.

Microsoft was not satisfied and wanted control over the software development lifecycle everywhere. More investment went into TFS, wrestling it into something that could be put in the Cloud. To hide its origins they renamed it Azure DevOps. You could define build and deployment pipelines as yaml files, which finally was an improvement over TeamCity, and the deployment pipes were not as good as Octopus Deploy, but they were good enough that people abandoned Team City, Octopus Deploy and to some extent Jira, so that code, build pipelines and tickets all would go on to live in TFS/Azure DevOps.

To summarise, .NET developers are inherently suspicious of software that comes from third parties. Only a select few pass the vibe check. Once they have reached that point of success they become too successful and Microsoft turns on them.

Current State of Affairs

By the end of last year, I would say your run-of-the-mill C# house would build code in Visual Studio – hopefully with ReSharper installed, keep source code, run tests, and keep tickets in Azure DevOps. The code would use MediatR to dispatch commands to handlers, use AutoMapper to translate requests from web front-end to handlers, and format responses back out to the web endpoint. Most likely data would be stored in an azure hosted SQL Database, possibly using Cosmos DB for non-relational storage and Redis for caches. It is also highly likely that unit tests used Fluent Assertions and Moq for mocks, because people like those.

Does everybody love this situation? Well, no. Tooling has over the years improved so vastly that you can easily navigate your way around your codebase by keyboard shortcuts, or a keyboard shortcut whilst clicking on an identifier. Except, when. you are using MediatR and AutoMapper it all becomes complicated. Looking at automapper mapping files, you wonder to yourself if it wouldn’t have been more straightforward to manually map the classes and have some unit tests to prove that they still work?

Fluent syntax as described a couple of places above was a fad in the early 2010s. You wrote a set of interfaces with a set of methods on them, that in turn returned objects with other interfaces on them, and you used these interfaces to build grammars. Basically a fluent validation library could offer a set of extension methods that would allow you to build assertion expressions directly off the back of a variable, like:

var result = _sut.CallMethod(Value);
result.Should().Not().BeNull();

The problem with fluent interfaces is that the grammar is not standardised, so you would type Is.NotNull, or Is().Not().Null, or any combination of the above and after a number of years, it all blends together. If you happen to have both a mocking library and an assertion library within your namespace, your IntelliSense will suggest all kinds of extension methods when you hit the full stop, and you are never quite sure if it is a filter expression for a mocking library or a constraint for an assertion library.

The Impending Apocalypse

Above is described the difficulty threading the needle between getting any traction in the .NET space yet not becoming too successful so that Microsoft decides to erase your company from the map. Generally, the audacity of open source contributors to want access to food and shelter is severely frowned upon on the internet. How dare they, is the general sentiment. After the success of Nuget, the avalanche of modern open source tooling becoming available to regular folks working the salt mines of .NET, GitHub issues have been flooded by entitled professional developers making demands off open source project maintainers without any reciprocity in the form of a support contract or source code contributions. In fairness to these entitled developers, they are pawns in a game where they have little agency in terms of spending time or money on things, so deciding to contribute a fix to an open source library could have detrimental effects on ones employment status, and there would be no way to get approval to buy a support contract without having a senior manager lodge a ticket with procurement and enduring the ensuing political consequences, but to the open source maintainer it is till a nightmare, regardless of people’s motivations.

A year ago, Moq shocked the developer community by including an obfuscated piece of tracking software, and more recently Fluent Assertions, MediatR, Automapper and MassTransit have announced they will move towards a paid license.

Conclusion

What does this mean for everybody? Well…. it depends. For an enterprise, the procurement dance starts, so heaven knows when there will be an outcome from that. Moving away from AutoMapper and MediatR is most likely too dramatic to consider as by necessity these things become choke points where literally everything in the app goes through mapping or dynamic dispatch, however – there are alternatives. Most likely the open source versions will be forked and maintained by others, but given the general state of .NET open source, it is probably more likely that instead of almost market wide adoption, you will see a much more selective user base going forward. I want all programmers to be able to eat, and I wish all the success to Jimmy Bogard going forward. He has had an enormous impact on the world of software development and deserve all the success he can get.

Ever since watching 8 lines of code by Greg Young I have been a luddite when it comes to various forms of magic, and been a proponent of manual DI and mapping. My advice would be to just rip this stuff out and inject your handler through the constructor the old fashioned way. Unless you have very specific requirements, I can almost guarantee that you will find it more readable and easier to maintain. Also, like Greg says – the friction is there for a reason. If your project becomes too big to understand without magic, that’s a sign to divide your solution into smaller deployable units.

The continuous march continues, where the teams at Microsoft usurp more and more features from the general .NET ecosystem and include them in future versions of .NET, C# or Visual Studio, squeezing the life out of smaller companies along the way.

I would like to see more situations like paket and xunit where Microsoft have stayed their hand and allowed the thing to exist. I think a healthy coexistence of multiple valid solutions would be healthier for all parties. I do not know how to bring that about. Developers in the .NET space remain first party loyal only, except for a very small number of products. I think it is necessary to build a marketplace where developers can buy or subscribe to software tools and libraries, which also offers software bill-of-materials type information so that managers can have some control over which licenses are in play, perhaps give developers a budget for how much they can spend on tools. That way enterprise devs can buy the tools they need whilst offering the transparency that is needed for compliance, with controls that allow an employer to limit spending whilst simultaneously allowing tool developers to feed their kids. I mean Steam exists, and that’s a marketplace that also exists on a Microsoft platform. Imagine a similar thing but for Nugets.

Barriers – trade and otherwise

April 7, 2025Architecture, GeneralRikard Ottosson

The last few days of highly retro American foreign policy and the ensuing global market chaos has made everyone aware of trade barriers, and will make everyone fully grasp the consequences of a trade war, the same way the pandemic allowed us to re-learn the lessons of 1919 and the Spanish flu.

Background

To take our minds off the impending doom, I will instead address other barriers, ones that are more related to technology and software, but still cause real world problems for real people.

There are a number of popular websites that enumerate falsehoods programmers believe about names, addresses and even time zones. I do not have the stamina currently to produce an equivalent treatise on falsehoods people believe about residency, but I can manage to put together a bit of a moan on the subject at least.

Identity

In some countries, there are surrogate keys that can be used to identify a person. If these are ubiquitous, you come to rely on everyone having them and may require them as part of customer onboarding. Do remember that tourists exist. Or legal residents that somehow still does not have this magical number, In countries that have had reasons to become paranoid about dodgy people buying train tickets online such as Spain, they may demand an identity number. If you will be in such a situation, try to accept an alternative such as passport number.

In other countries, where there is no surrogate key – you are your address, essentially- you will instead need to consider how you handle, well tourists as well, again, but also students that may reside with their parents still. Also – make it easy to update people’s addresses as they inevitably have to move in our current neo-feudal society of landed gentry and renter class.

Residence vs Citizenship

In Europe at least it has been relatively common for people to live permanently in a country other than one of which they are a citizen. Many countries put up barriers to become a citizen to ensure that only those who care become naturalised citizens and vote on important matters. Other countries hand out citizenship in cereal boxes, like Sweden, but regardless – just because someone lives in a country and pays taxes there, does not mean they are a citizen of that country. If you ask for a “country”, please be explicit if you mean residency or nationality. Also, people do have multiple citizenships sometimes. Does this matter to you? Do you need this information? Really? There are a lot of data privacy concerns here, also please consider purpose and retention of this data.

If Her Majesty (RIP) bestowed a Settled Status upon you as an EU citizen and UK resident, you will not be eligible for a biometric residency card from the Home Office. This is still a surprise to many embassies and IT systems the world over. And no, you can’t blag one from the Home Office. The IT system or embassy will need to deal with share codes from the Settlement Scheme that prove right to live and work in Blighty. Again, pay attention.

If you are in the business of selling international travel, you may need to support edge cases. People with dual citizenship are already required to show the correct passport at the correct gate, i.e. at St Pancras they show the UK border agent their British passport and as they walk across to the French border agent they show their EU passport.

With the UK introducing ETA it becomes more complicated. The current recommendation when booking travel is to use the passport for your country of residence, as it is easier for the airline / ferry / train company to know you have right to reside in your home country, but with ETA that becomes even more important, because if a dual passport holder shows up at say a foreign airport trying to check in to a flight to the UK booked on their non-UK passport, the airline is required to demand an ETA to allow boarding. Even for EU citizens with settled status this can be a problem, but for us we can still show a share code from the Home Office, but no such route exists for a citizen.

Things really kick off when the EU introduces ETIAS, when you as a dual citizen need to prove your right both ways, i.e. you need to book your EU bound flight with your EU passport and you return journey with your UK passport. British Airways and EasyJet already support having different identity documents for each leg, but Ryanair does not. I mention it here not to promote alternatives to Ryanair, but just to warn that this is something someone writing / selling software that deals with this area will have to consider.

As KYC becomes a bigger deal, and countries invent new barriers to live and work in other countries, you as – let’s say- the vendor or implementer of systems involved in recruitment will have to pay attention to how things really work to ensure you correctly gate keep your users so that you only refer people that can legally work in the jurisdiction you are serving. This seems onerous, so you will immediately partner with an identity and verification firm that deals with that for you. Great – as long as that supplier is up-to-date.

Payments

Unfortunately the very real threat of terror attacks and the less gruesome but more prevalent risk of fraud have meant that what used to be a stupid simple experience – buying things online – has become more complex.

Here you are usually bound by what measures your financial rails or your regulator put in place to reduce risk, but at least consider if you need to require the card to be registered to the shipping address, or if you as a UK company really require bank account and sort code to enter payment details as these are not printed on foreign debit cards. You will need it for direct debit of course, but always? Think about it, is all I ask.

I tried to buy car insurance when I was new in the UK, and I had to go to a brick and mortar insurer so that I could pay with my Swedish visa debit, as all online businesses required UK debit cards. it is hard to get a bank account as a n00b in any country, and that degree of difficulty helps other businesses. If you can show a bank statement mailed to your address, that’s prime identity document stuff in the UK.

Why all of this friction? Why?

Surely, someone has solved all of this. Yes, but I have never been a resident of Estonia, so I cannot tell you how utopia works.

I can warn you of what has happened in Sweden. Everyone – well, not everyone, but most people – has an electronic certificate called Bank ID. You can use it to pay, to do your taxes, to loan money, to buy a house, to start a business. Anything. In the high trust society of old you could just punch in your surrogate key, the personnummer, and you would without any further verification be subscribed to a newspaper. Both the newspaper and the invoice would arrive at your registered address. Yes the state would tell the newspaper where you live.

With BankID, you will now get a follow-up question and need to verify in an app. So people are creating shell companies, taking up loans, buying property and transferring funds on behalf of other people and using social engineering to make sure the app gets clicked.

There are calls for slowing down bureaucracy, making people show up in person to do some of this stuff due to the proliferation of fake identities issued through misuse of proper channels, hijacked cryptographic identities and malevolent automation. Some old fashioned bureaucracy can help reduce fraud by stalling, basically. Consider abuse as you design systems.

How small is small?

July 7, 2024Architecture, Blog, DevOps, WafflingRikard Ottosson

I have great respect for the professional agile coach and scrum master community. Few people seem to systematically care for both humans and business, maintaining profitability without ever sacrificing the humans. Now, however, I will alienate vast swathes of them in one post. Hold on.

What is work in software development?

Most mature teams do two types of work, they look after a system and make small changes to it – maintenance, keeping the lights on and new features that the business claims it wants. It is common to get an army in to build a new platform and then allow the teams to naturally attrit as transformational project works fizzles out, contractors leave, the most marketable developers either get promoted out of the team or get better offers elsewhere. A small stream of fine adjustment changes keep coming in to the core team of maintenance developers – effectively – that remains. Eventually this maintenance development work gets outsourced abroad.

A better way is to have teams of people that work together all day every day. Don’t expand and contract or otherwise mess with teams, hire carefully from the beginning and keep new work flowing into teams rather than restructuring after a piece of work is complete. Contract experts to pair with the existing team if you need to tech the team new technology, but don’t get mercenaries to do work. It might be slower, but if you have a good team that you treat well, odds are better they’ll stay, and they will develop new features to be able to be maintained better in the future and less likely to cut corners as any shortcuts will blow up in their own faces shortly later.

Why do we plan work?

When companies spend money on custom software development, a set of managers at very high positions within the organisation have decided that investing in custom software is a competitive advantage, and several other managers think they are crazy to spend all this money on IT.

To mollify the greater organisation, there is some financial oversight and budgeting. Easily communicated projects are sold to the business “we’ll put a McGuffin in the app”, “we’ll sprinkle some AI on it” or similar, and hopefully there is enough money in there to also do a bit of refactoring on the sly.

This pot of money is finite, so there is strong pressure to keep costs under control, don’t get any surprise AWS bills or middle managers will have to move on. Cost runaway kills companies, so there are legitimately people not sleeping at night when there are big projects in play.

How do we plan?

Problem statement

Software development is very different from real work. If you build a physical thing, anything from a phone to a house, you can make good use of a detailed drawing describing exactly how the thing is constructed and the exact properties of the components that are needed. If you are to make changes or maintain it, you need these specifications. It is useful both for construction and maintenance.

If you write the exact same piece of software twice, you have some kind of compulsive issue, you need help. The operating system comes with commands to duplicate files. Or you could run the compiler twice. There are infinite ways of building the exact same piece of software. You don’t need a programmer to do that, it’s pointless. A piece of software is not a physical thing.

Things change, a lot. Fundamentally – people don’t know what they want until they see it, so even if you did not have problems with technology changing underneath your feet whilst developing software, you would still have problems with the fact that fundamentally people did not know what they wanted back when they asked you to build something.

The big issues though is technology change. Back in the day, computer manufacturers would have the audacity to evolve the hardware in ways that made you have to re-learn how to write code. High level languages came along and now instead we live with Microsoft UI frameworks or Javascript frameworks that are mandatory one day and obsolete the next. Things change.

How do you ever successfully plan to build software, then? Well… we have tried to figure that out for seven decades. The best general concept we have arrived at so far is iteration, i.e. deliver small chunks over time rather than to try and deliver all of it at once.

The wrong way

One of the most well-known but misunderstood papers is Managing The Development of Large Software Systems by Dr Winston W Royce¹ that launched the concept Waterfall.

Basically, the software development process in waterfall is outlined into distinct phases:

System requirements
Software requirements
Analysis
Program design
Coding
Testing
Operations

For some reason people took this as gospel for several decades, despite the core, fundamental problem that dooms the process to failure is outlined right below figure 2 – the pretty waterfall illustration of the phases above – that people keep referring to, it says:

I believe in this concept, but the implementation described above is risky and invites failure. The
problem is illustrated in Figure 4. The testing phase which occurs at the end of the development cycle is the
first event for which timing, storage, input/output transfers, etc., are experienced as distinguished from
analyzed. These phenomena are not precisely analyzable. They are not the solutions to the standard partial
differential equations of mathematical physics for instance. Yet if these phenomena fail to satisfy the various
external constraints, then invariably a major redesign is required. A simple octal patch or redo of some isolated
code will not fix these kinds of difficulties. The required design changes are likely to be so disruptive that the
software requirements upon which the design is based and which provides the rationale for everything are
violated. Either the requirements must be modified, or a substantial change in the design is required. In effect
the development process has returned to the origin and one can expect up to a lO0-percent overrun in schedule
and/or costs.
Managing The Development of Large Software Systems, Dr Winston W Royce

Reading further, Royce realises that a more iterative approach is necessary as pure waterfall is impossible in practice. His legacy however was not that.

Another wrong way – RUP

Rational Rose and the Rational Unified Process was the Chat GPT of the late nineties, early noughties. Basically, if you only would make an UML drawing in Rational Rose, it would give you a C++ program that executed. It was magical. Before PRINCE2 and SAFe, everyone was RUP certified. You had loads of planning meetings, wrote elaborate Use Cases on index cards, and eventually you had code. It sounds like waterfall with better tooling.

Agile

People realised that when things are constantly changing, it was doomed to have a fixed plan to start with and to stay on it even when you knew that it was unattainable or undesirable to reach the original goal. Loads of attempts were made, but one day some people got together to actually have a proper go at defining what should be the true way going forward.

In February 11-13, 2001, at The Lodge at Snowbird ski resort in the Wasatch mountains of Utah, seventeen people met to talk, ski, relax, and try to find common ground—and of course, to eat. What emerged was the Agile ‘Software Development’ Manifesto. Representatives from Extreme Programming, SCRUM, DSDM, Adaptive Software Development, Crystal, Feature-Driven Development, Pragmatic Programming, and others sympathetic to the need for an alternative to documentation driven, heavyweight software development processes convened.
History: The Agile Manifesto

So – everybody did that, and we all lived happily ever after?

Short answer: No. You don’t get to just spend cash, i.e. have developer do work, without making it clear what you are spending it on, why, and how you intend to know that it worked. Completely unacceptable, people thought.

The origins of tribalism within IT departments have been done to death in this blog alone, so for once it will not be rehashed. Suffice to say, organisationally often staff is organised according to their speciality rather than in teams that produce output together. Budgeting is complex, there can be political competition that is counter productive to IT as a whole or for the organisation as a whole.

Attempts at running a midsize to large IT department that develops custom software have been made in form of Scaled Agile Framework (SAFe), DevOps and SRE (where SRE is addressing the problem backwards, from running black-box software using monitoring, alerts, metrics and tracing to ensure operability and reliability of the software).

As part of some of the original frameworks that came in with the Agile Manifesto, a bunch of practices became part of Agile even though they were not “canon”, such as User Stories, that were said to be a few words on an index card, pinned to a noticeboard in the team office, just wordy enough to help you discuss a problem directly with your user. This of course eventually started to develop back into the verbose RUP Use Cases from yesteryear, but “agile, because they are in Jira”, and rules had to be created for the minimum amount of information on there to successfully deliver a feature. In the Toyota Production System that originated Scrum, Lean Software Development and Six Sigma (sadly, an antipattern), one of the key the lessons is The ideal batch size is 1, and generally making smaller changes. This explosion in size of the user story is symptomatic of the remaining problems in modern software development.

Current state of affairs

So what do we do

As you can surmise if you read the previous paragraphs, we did not fix it for everybody, we still struggle to reliably make software.

The story and its size problems

The part of this blog post that will alienate the agile community is coming up. The units of work are too big. You can’t release something that is not a feature. Something smaller than a feature has no value.

If you work next to a normal human user, and they say – to offer an example – “we keep accidentally clicking on this button, so we end up sending a message to the customer too early, we are actually just trying to get to this area here to double-check before sending”, you can collaboratively determine the correct behaviour, make it happen, release in one day, and it is a testable and demoable feature.

Unfortunately requirements tend to be much bigger and less customer facing. Like, department X want to start seeing the reasons for turning down customer requests in their BI tooling being a feature, and then a “product backlog item” could be service A and service B needs to post messages on a message bus in various positions of the user flow identifying reasons.

Iterating over and successfully releasing this style of feature to production is hard.

Years ago I saw Allen Holub speaking on SD&D in London and his approach to software development is very pure. It is both depressing and enlightening to read the flamewars that erupt in his mentions on Twitter when he explains how to successfully make and release small changes. People scream and shout that it is not possible to do it his way.

In the years since, I have come to realise that nothing is more important than making smaller units of work. We need to make smaller changes. Everything gets better if / when we succeed. It requires a mindset shift, a move away from big detailed backlogs to smaller changes, discussed directly with the customer (in the XP sense, probably some other person in the business, or another development team). To combat the uncertainty, it is possible to mandate some kind of documentation update (graph? chart?) as part of the definition of done. Yes, needless documentation is waste, but if we need to keep a map over how the software is built, as long as people actually consult it, it is useful. We don’t need any further artefacts of the story once the feature is live in production anyway.

How do we make smaller stories?

This is the challenge for our experts in agile software development. Teach us, be bothered, ignore the sighs of developers that still do not understand, the ones raging in Allen Holub’s mentions. I promise, they will understand when they see it first hand. Daily releases of bug free code. They think people are lying to them when they hear us talk about it. When they experience it though, they will love it.

When every day means a new story in production, you also get predictability. As soon as you are able to split incoming or proposed work into daily chunks, you also get the ability to forecast – roughly, better than most other forms of estimate – and since you deliver the most important new thing every day, you give the illusion of value back to those that pay your salary.

Surprises

June 2, 2024BlogRikard Ottosson

TIL, TIFO or “I was today years old when”

Universal markers of new information. But it shouldn’t have been new. Documentation had been created.

I can tell you, the surprise was monumental.

In the olden days – if you ran a dotnet project without specifying UseUrls() or other tricks for local development, the containerised application would listen to ports 80 and 443.

Hence the default Dockerfile generated in some existing .NET projects will contain the rows EXPOSE 80 and EXPOSE 443. All of a sudden these two commands are completely useless, as the app quietly instead listen to ports 8080 and 8081 respectively, meaning you are exposing a port that nothing listens to.

The purpose behind the change of default port is to enable increased security. A non-root user cannot listen to ports below 1024(?) and can be set up with file access restrictions that add an additional layer of security, preventing a user from some actions even if they somehow gain entry to the website worker process.
You are still free to ignore the security benefits and keep running your website as root, but unless you take action, your docker-based apps will fail after you upgrade.

Alternatives

Embrace change – Run container as low level user

You need to change the Dockerfile to expose the ports the app listens to, so add EXPOSE statements allowing docker to map your ports. You also need to add the instruction USER app in the last bit of the Dockerfile to make sure the container will run as that user.

Of course, since the container now exposes ports 8080 and 8081 instead, you must map whatever runtime environment you are using, such as ACA configuration or ECS task definitions to take into account your non-standard ports.

Embrace change cautiously – run on new ports but as root

You need to change the Dockerfile to expose the ports the app listens to, so add EXPOSE statements allowing docker to map your ports.

Reject change – embrace peace of mind

You can pretend like the world is a safe place and override the default ports by settings environment variables in your Dockerfile towards the end, in the final stage:

ENV ASPNETCORE_HTTP_PORTS="80,8080" ASPNETCORE_HTTPS_PORTS="443,8081"

This leaves the status quo essentially intact.

Conclusion

I suspect option 3 is the most relevant if you have existing deployments and cloud configurations where modifying port numbers is either not available to you as you would hae needed to depend on another team to make a change on their own schedule- or perhaps the change to your runtime configuration would simply be prohibitively time consuming.

Due to the limited change between Option 2 and 1, it would seem silly to stop at changing port configurations without getting the security benefit of a least privilege runtime user, so I would suggest that for new projects or low complexity software estates, definitely pick option 1 and enjoy the improved security, and in other cases – go option 3 until the security benefit makes it worth spending your maintenance time making the change to non-root running.

Abstractions, abstractions everywhere

May 23, 2024Architecture, DevOps, EvilRikard Ottosson

X, X everywhere meme template, licensed by imgflip.com

All work in software engineering is about abstractions.

Abstractions

All models are wrong, but some are useful
George Box

It began with Assembly language, when people were tired of writing large-for-its-time programs in raw binary instructions, they made a language that basically mapped each binary instruction to a text value, and then there was an app that would translate that to raw binary and print punch cards. Not a huge abstraction, but it started there. Then came high level languages and off we went. Now we can conjure virtual hardware out of thin air with regular programming languages.

The magic of abstractions, it really gives you an amazing leverage, but at the same time you sacrifice actual knowledge of the implementation details, meaning you often get exposed to obscure errors that you either have no idea what they mean, or even worse- understand exactly what’s wrong but you don’t have access to make that change because the source is just a machine translated piece of Go, and there is no way to fix the translated C# directly, just to take one example.

Granularity and collaboration across an organisation

Abstractions in code

Starting small

Most systems start small, solving a specific problem. This is done well, and the requirements grow, people begin to understand what is possible and features accrue. A monolith is built, and it is useful. For a while things will be excellent and features will be added at great speed, and developers might be added along the way.

A complex system that works is invariably found to have evolved from a simple system that worked
John Gall

Things take a turn

Usually, at some point some things go wrong – or auditors get involved because regulatory compliance – and you prevent developers from deploying to production, hiring gate keepers to protect the company from the developers. In the olden days – hopefully not anymore – you hire testers to do manual testing to cover a shortfall in automated testing. Now you have a couple of hand-offs within the team, meaning people write code, give it to testers who find bugs, work goes the wrong way – backwards – for developers to clean up their own mess and to try again. Eventually something will be available to release, and the gate keepers will grudgingly allow a change to happen, at some point.

This leads to a slow down in the feature factory, some old design choices may cause problems that further slow down the pace of change, or – if you’re lucky – you just have too many developers in one team, and you somehow have to split them up in different teams, which means comms deteriorate and collaborating in one codebase becomes even harder. With the existing change prevention, struggles with quality and now poor cross-team communication, something has to be done to clear a path so that the two groups of people can collaborate effectively.

Separation of concerns

So what do we do? Well, every change needs to be covered by some kind of automated test, if only to at first guarantee that you aren’t making things worse. This way you can now refactor the codebase to a point where the two groups can have separate responsibilities, and collaborate over well defined API boundaries, for instance. Separate deployable units, so that teams are free to deploy according to their own schedule.

If we can get better collaboration with early test designs and front-load test automation, and befriend the ops gatekeepers to wire in monitoring so that teams are fully wired in to how their products behave in the live environment, we would be close to optimum.

Unfortunately – this is very difficult. Taking a pile of software and making sense of it, deciding how to split it up between teams, gradually separating out features can be too daunting to really get started. You don’t want to break anything, and if you – as many are won’t to do, especially if you are new in an organisation – decide to start over from scratch, you may run into one or more of the problems that occur when attempting a rewrite. One example being where you end up in a competition against a moving target. The same team has to own a feature in both the old and the new codebase, in that case, to stop that competition. For some companies it is simply worth the risk, they are aware they are wasting enormous sums of money, but they still accept the cost. You would have to be very brave.

Abstractions in Infrastructure

From FTP-from-within-the-editor to Cloud native IaC

When software is being deployed – and I am ignoring native apps now, largely, and focusing on web applications and APIs- there are a number of things that are actually happening that are at this point completely obscured by layers of abstraction.

The metal

The hardware needs to exist. This used to be a very physical thing, a brand new HP ProLiant howling in the corner of the office onto which you installed a server OS and set up networking so that you could deploy software on it, before plugging it into a rack somewhere, probably a cupboard – hopefully with cooling and UPS. Then VM hosts became a thing, so you provisioned apps using VMWare or similar and got to be surprised at how expensive enterprise storage is per GB compared to commodity hardware. This could be done via VMWare CLI, but most likely an ops person pointed and clicked.

Deploying software

Once the VM was provisioned, things like Ansible, Chef and Puppet began to become a thing, abstracting away the stopping of websites, the copying of zip files, the unzipping, the rewriting configuration and the restarting of the web app into a neat little script. Already here you are seeing problems where “normal” problems, like a file being locked by a running process, show up as a very cryptic error message that the developer might not understand. You start to see cargo cult where people blindly copy things from one app to another because you think two services are the same, and people don’t understand the details. Most of the time that’s fine, but it can also be problematic with a bit of bad luck.

Somebody else’s computer

Then cloud came, and all of a sudden you did not need to buy a server up front and instead rent as much server as you need. Initially, all you had was VMs, so your Chef/Puppet/Ansible worked pretty much the same as before, and each cloud provider offered a different was of provisioning virtual hardware before you came to the point where the software deployment mechanism came into play. More abstractions to fundamentally do the same thing. Harder to analyse any failures, you some times have to dig out a virtual console to just see why/how an app is failing because it’s not even writing logs. Abstractions may exist, but they often leak.

Works on my machine-as-a-service

Just like the London Pool and the Docklands were rendered derelict by containerisation, a lot of people’s accumulated skills in Chef and Ansible have been rendered obsolete as app deployments have become smaller, each app simply unzipped on top of a brand new Linux OS sprinkled with some configuration answer, and then have the image pushed to a registry somewhere. On one hand, it’s very easy. If you can build the image and run the container locally, it will work in the cloud (provided the correct access is provisioned, but at least AWS offer a fake service that let’s you dry run the app on your own machine and test various role assignments to make sure IAM is also correctly set up. On the other hand, somehow the “metal” is locked away even further and you cannot really access a console anymore, just a focused log viewer that let’s you see only events related to your ECS task, for instance.

Abstractions in Organisations

The above tales of ops vs test vs dev illustrates the problem of structuring an organisation incorrectly. If you structure it per function you get warring tribes and very little progress because one team doesn’t want any change at all in order to maintain stability, the other one gets held responsible for every problem customers encounter and the third one just wants to add features. If you structured the organisation for business outcome, everyone would be on the same team working towards the same goals with different skill sets, so the way you think of the boxes in an org chart can have a massive impact on real world performance.

There are no solutions, only trade-offs, so consider the effects of sprinkling people of various background across the organisation, if instead of being kept in the cellar as usual you start proliferating your developers among the general population of the organisation, how do you ensure that every team follows the agreed best practices, that no corners are cut even when a non-technical manager is demanding answers. How do you manage performance of developers you have to go out of your way to see? I argue such things are solvable problems, but do ask your doctor if reverse Conway is right for you.

Conclusion

What is a good abstraction?

Coupling vs Cohesion

If a team can do all of their day-to-day work without waiting for another team to deliver something or approve something, if there are no hand-offs, then they have good cohesion. All the things needed are to hand. If the rest of the organisation understands what this team does and there is no confusion about which team to go to with this type of work, then you have high cohesion. It is a good thing.

If however, one team constantly is worrying about what another team is doing, where certain tickets are in their sprint in order to schedule their own work, then you have high coupling and time is wasted. Some work has to be moved between teams or the interface between the teams has to be made more explicit in order to reduce this interdependency.

In Infrastructure, you want the virtual resources associated with one application to be managed within the same repository/area to offer locality and ease of change for the team.

Single Responsibility Principle

While dangerous to over-apply within software development (you get more coupling than cohesion if you are too zealous), this principle is generally useful within architecture and infrastructure.

Originally meaning that one class / method should only do one thing – an extrapolation of the UNIX principles – it can more generally be said to mean that on that layer of abstraction, a team, infrastructure pipe, app, program, class […] should have one responsibility. This usually mean a couple of things happen, but they conceptually belong together. They have the same reason to change.

What – if any – pitfalls exist ?

The major weakness of most abstractions is when they fall apart, when they leak. Not having access to a physical computer is fine, as long as the deployment pipeline is working, as long as the observability is wired up correctly, but when it falls down, you still need to be able to see console output, you need to understand how networking works, to some extent, you need to understand what obscure operating system errors mean. Basically when things go really wrong you are needed to have already learned to run that app in that operating system before, so you recognise the error messages and have some troubleshooting steps memorised.
So although we try and save our colleagues from the cognitive load of having to know everything we were forced to learn over the decades, to spare them the heartache, they still need to know. All of it. So yes, the danger with the proliferation of layers of abstraction is to pick the correct ones, and to try and keep the total bundle of layers as lean as possible because otherwise someone will want to simplify or clarify these abstractions by adding another layer on top, and the cycle begins again.

Cutler, Star Wars, fast feedback and accuracy

October 21, 2023General, Uncategorized, Win32Rikard Ottosson

As my father’s son I have always been raised to see the plucky heroes at AT&T and Bell Labs, Brian Kernighan and Ken Ritchie as the Jedi, fighting with the light side of the force, thus foregoing force lightning and readable documentation as that is only available to Dark Side force users such as Bill Gates. Anders Hejlsberg must be Anakin Skywalker in my father’s cinematic universe as Anders first created Turbo Pascal, our favourite programming language when I was little, only to turn to the Dark Side and forge C# and TypeScript – presumably using red khyber crystals.

Anyway – the Count Dooku in this tortured analogy would be Dave Cutler – who created both VMS and Windows NT, was recently interviewed on Dave Plummer’s YouTube channel for several hours.

If you ever think you are going to watch that interview, do so now, unless you are OK with spoiler or spoiler-adjacent content. Also be warned the interview is long. Very, very long.

I find it fascinating how far removed his current work at XBox game streaming (I think? XCloud sounds like a game streaming solution) is from where he started as he left college. His first foray in computing was in simulations, and he had to carry a bring a pile of punch cards – a thousand punch cards I think he says – to run a simulation at IBM, because the computing power requirement was too big for what they had locally at the paper company where he was working.

Later he complains about current developers lack of attention to detail, the host theorising that perhaps the faster inner loops for today’s developers make them less likely to show the requisite skill or diligence.

Now, wait before you flood his YouTube video with angry comments – please let me agree, so that you can post angry comments here instead, thus driving engagement.

I want to say, I see where he’s coming from.

A lot of really hard problems have solutions now that didn’t exist when I started out, and weren’t even possible to conceptualise when Cutler started out. You can realistically do TDD now. If you don’t have access to a piece of hardware you need to write drivers for, you could still most likely have your employer buy enough hardware that you could simulate that hardware with very little effort. You can essentially make it so that you immediately know when you are breaking things as you are typing.

But also… you could save yourself from a majority of problems by taking a bit more care. Like, hey, are you stuck with legacy code with poor test coverage? Well – if that product is an API that is called by another team – just because you have Swagger documentation and the guys can reach you on Slack/Teams, don’t just randomly change behaviours in an endpoint so that dependent services are broken. Why not just make a new endpoint for the new behaviour? And – seriously – when you make new code – why can’t that at least that new bit be unit tested?

Even if you still cannot add unit tests, if you had the mindset of being about to drive far away to test the software live with only you and your boss – don’t you think you would be more carefully reviewing every single change? Really?

Even TDD avatars Feathers and Beck will acknowledge coming across code that isn’t unit tested but still is navigable and observably correct. You could achieve that without any extra build steps or frameworks. You could just pay attention, it has been done in life.

Now, at the same time when we old timers get excited about the olden days we like to brag about how many hours we worked, and I can tell from personal experience that when you don’t sleep -accuracy is the first to go, so – if you are going to make changes in systems where you have nothing like unit test or integration tests in place to help you, make sure you give yourself a couple of extra minutes, basically.

The main reason behind this post is to say:
We shouldn’t be luddites by rejecting modern development practices, but also – if you are a junior dev stuck in a team that writes and maintains legacy code, you could still write defect free code, you just have to pay more attention and go a bit slower. There is no immediate get-out-of-jail-free card because the builds are flaky, like. And if you are struggling and you MUST add unit tests in order to manage to keep high standards for code quality, you could probably refactor into acceptable test coverage more easily than you think.

Our luxury as software developers – to be able to know well ahead of time exactly how our code will work in production would be a dream in most disciplines, yet a surprisingly low number of bridges randomly collapse. People do sane things and catastrophes fail to appear.
The legacy project you got saddled with can cause you to go slow, but it really shouldn’t give you license to deliver new bugs, Don’t internalise when acceptance for lower standards may become evident in your team, such as “well, when X and Y happen and they refuse to Z, this is the best they’ll get”. Instead just don’t deliver, and explain why you cannot. If you don’t offer the lower standards, the resulting bugs won’t appear in your code.

McKinsey and the elusive IT department

September 10, 2023.NETRikard Ottosson

I know that both my readers are software developers, so – this is a bit of a departure.

Background

Within a business, there are many financial decisions to be made, and nothing kills a business as fast as costs quietly running away. This is why companies have bureaucracy, to make sure that those who enter into contracts on behalf of the company (primarily sales and procurement) are doing so with due skill and care.

Managers follow up on performance down on the individual level. Commission and bonuses reward top performers, career progression means you are judged by the average scores of your subordinates. If you miss budget you are held accountable. What went wrong? What changes are you implementing, why do you think those changes will make a difference?

CEOs are thinking- why is IT, and specifically IT software development so unwilling to produce metrics and show their work? Are they not monitoring their people? Surely they will want to know who their worst performers so that they can train them or ultimately get rid of them if all else fails?

In this environment, senior developers turned junior managers are treading lightly to try and explain to senior management the ways in which – compared to doing literally nothing – measuring incorrectly can cause much worse problems in terms of incorrect optimisations or alienating and losing the wrong individual contributors, but as far as I know, rarely have any inroads been made into fairly and effectively measuring individual performance in an IT department. You can spot toxic people, or people that plainly lied on their CV, but apart from clear-cut cases like that, there are so many team factors that affect the individual, that getting rid of people instead of addressing systemic or process-related problems is like cutting off your nose to spite your face.

Bad metrics are bad

What do we mean bad metrics? How about a fictionalised example of real metrics out there: LOC. Lines of Code. How many lines of code did Barry commit yesterday? 450? Excellent. Give the man a pay rise! How about Lucy? Oh dear… only 220. We shall have to look at devising a personal improvement plan and file with HR.

However, upon careful review – it turns out that out of Barry’s contribution yesterday, 300 lines consisted of a piece of text art spelling out BARRY WOZ ERE. Oh, that’s unfortunate, he outsmarted our otherwise bullet-proof metric. I know, we solve everything by redefining our metric to exclude comment lines. NCLOC, non-comment lines of code. By Jove, we have it! Lucy is vindicated, she doesn’t write comments in her code so she kept all her 220 lines and is headed for an Employee of the Month trophy if she keeps this up. Now unfortunately after this boost in visibility within the team, Lucy is tasked with supervising a graduate in the team, so they pair program together, sometimes on the graduate’s laptop and sometimes on Lucy’s, and because life is too short they don’t modify the git configuration for every single contribution, so half the day’s contributions fall under the graduate’s name in the repository and the rest under Lucy’s. The erstwhile department star sees her metrics plummet and can feel the unforgiving gaze from her line manager. Barry is looking good again, because he never helps anyone.

So – to the people on the factory floor of software development, the drawbacks and incompleteness of metrics for individual performance are quite obvious, but that doesn’t help senior management that have legitimate concerns for how company resources are spent, and want to have some oversight.

Good metrics are good

After several decades of this situation, leading experts in the field of DevOps decided to research how system development works in big organisations to try and figure out what works and what doesn’t. In 2016 the result came out in the form of the DORA metrics, throughput (deployment frequency, lead time for changes), and stability (mean time to recover, change failure rate) were published in the State of DevOps report. This measures the output of a team or a department, not an individual, and the metrics help steer improvements in software delivery in a way that cannot be gamed to produce good metrics but negative actual outcomes. Again – the goal of the DORA metrics are to ensure that software design, construction, delivery continuous improvement is successfully undertaken, to avoid pitfalls of failed software projects and astronomical sums of money lost – measuring team or individual performance is not what it’s about.

In the post-pandemic recovery phase a lot of organisations are looking at all of the above and are asking for a way to get certainty and to get actual insight into what IT is doing with all the money, tired of the excuses being made by evasive CTOs or software development managers. How hard could it be?

A proposal, the blowback and a suggestion

Whenever there is a willing customer, grifters will provide a service.

McKinsey took among other things the DORA metrics and NCLOC to cook up their own custom metric to solve this problem once and for all.

Obviously, the response from software development specialists was unanimously critical, and the metric was destroyed up and down the internet, and I must admit I enjoyed reading or watching several eviscerations, especially one of the fathers of DevOps, Dave Farley, had a very effective critique of the paper on YouTube.

There was no solution though. No-one was trying to help the leaders get the insights they crave. There was limited understanding of why. Surely you must trust your employees, why else did you hire them?

Then I stumbled upon a series of writings from Kent Beck and Gegerly Orosz, trying to address this head on, to do what McKinsey hadn’t achieved, which I found so inspiring that this blog exists just as an excuse to post the links:

https://newsletter.pragmaticengineer.com/p/measuring-developer-productivity
https://newsletter.pragmaticengineer.com/p/measuring-developer-productivity-part-2

If you need to know why the proposed McKinsey metrics are bad, look at Dave Farley’s video, and if you what to know what to do instead, read the writings of Beck/Orosz.