Monthly Archives: November 2025

Unix mail

E-mail used to be a file. A file called something like /var/spool/mail/[username], and new email would appear by text being appended to that file. The idea was that the system could send notifications to you by appending messages there, and you could send messages to other users by appending text to the files belonging to other users, using the program mail.

Later on you could send email to other machines on the network, by addressing it with user name, and @ sign and the name of the computer. I am not 100% sure, and I am too lazy to look it up, but the way this communication happened was using SMTP, simple mail transfer protocol. You’ll notice that SMTP only lets you send mail (and implicitly appending it to the file belonging to the user you are sending to).

Much later Post Office Protocol was invented, so that you could fetch your email from your computer at work and download to your Eudora email client on your Windows machine at home. It would just fetch the email from your file, optionally removing the email from that file as doing so.

As Lotus and Microsoft built groupware solutions loosely built on top of email, people wanted to access the email on the server rather than always download thenm, and have the emails organised in folders, which led to the introduction of IMAP.

Why am I mentioning this? Well, you may -if you are using a UNIX operating system still see the notification “You have mail” as you open a new terminal. It is not as exciting as you may think, it is probably a guy called cron that’s emailing you, but still – the mailbox is the void in which the system screams when it wants your help, so it would be nice to wire it into your mainstream email reader.

Because I am running Outlook to handle my personal email on my computer, I had to hack together a python script that does this work. It seems if I was using Thunderbird I could still have UNIX mail access, but.. it’s not worth it.

What Is and Should Never Be

I have been banging on about the perils of the Great Rewrite in many previous posts. Huge risks regarding feature creep, lost requirements, hidden assumptions, spiralling cost, internal unhealthy competition where the new system can barely keep up with the evolving legacy system, et cetera.

I will in this post attempt to argue the opposite case. Why should you absolutely embark on a Great Rewrite? How do you easily skirt around the pitfalls? What rewards lay in front of you? I will start with my usual standpoint, what are invallid excuses for embarking on this type of journey?

Why NOT to give up on gradual refactoring?

If you analyse your software problems and notice that they are not technical in nature, but political there is no point in embarking on a massive adventure, because the dysfunction will not go away. No engineering problems are unsolvable, but political roadblocks can be completely immovable under certain circumstances. You cannot make two teams collaborate where the organisation is deeply invested in making that collaboration impossible. This is usually a problem of P&L where the hardest thing about a complex migration problem is the accounting and budgeting involved in having a key set of subject matter experts collaborating cross-functionally.

The most horrible instances of shadow IT or Frankenstein middleware have been created because the people that ought to do something were not available so some other people had to do something themselves.

Basically, if – regardless of size of work – you cannot get a piece of work funnelled through the IT department into production in an acceptable time, and the chief problem is the way the department operates, you cannot fix that by ripping out the code and starting over.

Why DO give up on gradual refactoring?

Impossible to enact change in a reasonable time frame.

Let us say you have an existing centralised datastore that has several small systems integrate across it in undocumented ways, and your largest legacy systems are getting to the point where its libraries cannot be upgraded anymore. Every deployment is risky, and performance characteristics are unpredictable for every change, and your business side, your customer in the lean sense, demands quicker adoption of new products. You literally cannot deliver what the business wants in a defensible time.

It may be better to start building a new system for the new products, and refactor the new system to bring older products across after a while. Yes, the risk of a race condition between new and old teams is enormous, so ideally teams should own the business function in both the new and the old system, so that the developers get some accidental domain knowledge which is useful when migrating.

Radically changed requirements

Has the world changed drastically since the system was first created? Are you laden with legacy code that you would just like to throw away, except the way the code is structured you would first need to do a great refactor before you can throw bits away, but the test coverage is too low to do so safely?

One example of radically changed requirements could be – you started out as a small site only catering to a domestic audience, but then success happens and you need to deal with multiple languages and the dreaded concept of timezones. Some of the changes necessary for such a change can be of the magnitude that you are better off throwing away the old code rather than touching almost every area of the code to use resources instead of hard coded text. This might be an example of amortising on well adjudicated technical debt. The time to market gain you made by not internationalising your application first time round could have been the difference that made you a success, but still – now that choice is coming back to haunt you.

Pick a piece of functionality that you want to keep, and write a test around both the legacy and the new version to make sure you cover all requirements you have forgotten over the years (this is Very Hard to Do). Once you have correctly implemented this feature, bring it live and switch off this feature in the legacy system. Pick the next keeper feature and repeat the process, until nothing remains that you want to salvage from the old system and you can decommission the charred remains.

Pitfalls

Race condition

Basically, you have a team of developers implement client onboarding in the new system. Some internal developers and a couple of external boutique consultants from some firm near Old Street. They have meetings with the business, i.e strategic sales and marketing are involved, they have an external designer involved to make sure the visuals are top notch, meanwhile in the damp lower ground floor, the legacy team has Compliance in their ear about the changes that need to go live NOW or else the business risk being in violation of some treaty that enters into force next week.

I.e. as the new system is slowly polished, made accessible, perhaps being a bit bikeshedded as too many senior stakeholders get involved, the requirements for the actual behind-the-scenes criteria that need to be implemented are rapidly changing, and to the team involved in the rework it seems that the goalposts never stop moving, and most of the time they are never told, because compliance “already told IT”, i.e. the legacy team.

What is the best way to avoid this? Well, if legacy functionality seems to have high churn, move it out into a “neutral venue”, a separate service that can be accessed from both new and old systems and remove the legacy remains to avoid confusion. Once the legacy system is fully decommissioned you can take a view and see if you want to absorb these halfway houses or if you are happy with how they are factored. The important thing is that key functionality only exists in one location at all time.

Stall

A brave head of engineering sets out to implement a new modern web front-end, replacing a server rendered website communicating via soap with a legacy backend where all business logic lives. Some APIs have to be created to do processing that that the legacy website did on its own before or after calling into the service. On top of that, a strangler fig pattern is implemented around the calls to the legacy monolith, primarily to isolate the use of soap away from the new code, but also to obviate some of the things that is deemed not to be worth taking the round trip over soap. Unfortunately, after the new website is live and complete, the strangler fig has not actually strangled the back-end service, and a desktop client app is still talking soap directly to the backend service with no intention of ever caring about or even acknowledging the strangler fig. Progress ceases and you are stuck with a half-finished API that in some cases implements the same features as the backend service, but in most cases just acts as a wrapper around soap. Some features live in two places, and nobody is happy.

How to avoid it? Well, things may happen that prevent you from completing a long term plan, but ideally, if you intend to strangle a service, make sure all stakeholders are bought into the plan. This can be complex if the legacy platform being strangled is managed by another organisation, e.g. an outsourcing partner.

Reflux

Lets say you have a monolithic storage, the One Database. Over the years BI and financial ops have gotten used to querying directly into the One Database to capture reports. Since the application teams are never told about this work, the reports are often broken, but they persevere and keep maintaining these reports anyway. The big issue for engineering is the host of “batch jobs”, i.e. small programs run from a band built task scheduler from 2001 that does some rudimentary form of logging directly into a SchedulerLogs database. Nobody knows what these various programs do, or which tables in the One Database they touch, just that the jobs are Important. The source code for these small executables exist somewhere, probably… Most likely in the old CVS install on a snapshot of a Windows Server 2008 VM that is an absolute pain to start up, but there is a batch file from 2016 that does the whole thing, it usually works.

Now, a new system is created. Finally, the data structure in the New Storage is fit for purpose, new and old products can be maintained and manipulated correctly because there are no secret dependencies. An entity relationship that was stuck as 1-1 due to an old, bad design that had never been possible to rectify – as it would break the reconciliation batch job that nobody wants to touch – can finally be put right, and several years worth of poor data quality can finally be addressed.

Then fin ops and BI write an angry email to the CFO that the main product no longer reports data to their models, and how can life be this way, and there is a crisis meeting amongst the C-level execs and an edict is brought down to the floor, and the head of engineering gets told off for threatening to obstruct the fiduciary duties of the company, and is told to immediately make sure data is populated in the proper tables… Basically, automatically sync the new data to the old One Database to make sure that the legacy Qlik reports show the correct data, which also means that some of the new data structures have to be dismantled as they cannot be meaningfully mapped back to the legacy database.

How do you avoid this? Well, loads of things were wrong in this scenario, but my hobby-horse is about abstractions, i.e. make sure any reports pointing directly into an operational database do not do that anymore. Ideally you should have a data platform for all reporting data where people can subscribe to published datasets, i.e. you get contracts between producer and consumer of data so that the dependencies are explicit and can be enforced, but. at minimum have some views or temporary tables that define the data used by the people making the report. That way they can ask you to add certain columns, and as a developer you are aware that your responsibility is to not break those views at any cost, but you are still free to refactor underneath and make sure the operational data model is always fit for purpose.

Conclusion

You can successfully execute a great rewrite, but unless you are in a situation where the company has made a great pivot and large swathes of the feature in the legacy system can just be deleted, you will always contend with legacy data and legacy features, so fundamentally is is crucial to avoid at least the pitfalls listed above (add more in the comments, and I’ll add them and pretend they were there all along). Things like how reporting will work must be sorted out ahead of time. There will be lack of understanding, shock and dismay, because what we see as hard coupling and poor cohesion, some folks will see as single pane of glass, so some people will think it is ludicrous to not use the existing database structure forever. All the data is there already?!

Once there is a strategy and a plan in place for how the work will take place, the organisation will have to be told that although you were not of the opinion that we were moving quickly before, we shall actually for a significant time worsen our response times regarding new features as we dedicate considerable resources to performing a major upgrade to our platform into a state that will be more flexible and easy to change.

Then the main task is to only move forward at pace, and to atomically go feature by feature into the new world, removing legacy as you go, and use enough resources to keep the momentum going. Best of luck!

Are the kids alright?

I know in the labour market of today, suggesting someone pick up programming from scratch is akin to suggesting someone dedicate their life to being a cooper. Sure, in very specific places, that is a very sought after skill that will earn you a good living, but compared to its heyday the labour market has shrunk considerably.

Getting into the biz

How do people get into this business? As with I suspect most things, there has to be a lot of initial positive reinforcement. Like – you do not get to be a great athlete without several thousands of hours of effort rain or shine, whether you enjoy it or not – but the reason some people with “talent” end up succeeding is that they have enough early success to catch the “bug” and stick at it when things inevitably get difficult and sacrifices have to be made.

I think the same applies here, but beyond e-sports and streamer fame, it has always been more of an internal motivation, the feeling of “I’m a genius!” when you acquire new knowledge and see things working. It used to help to have literally nothing else going on in life that was more rewarding, because just like that fleeting sensation of understanding the very fibre of the universe, there is also the catastrophic feeling of being a fraud and the worst person in the world once you stumble upon something beyond your understanding, so if you had anything else to occupy yourself with, the temptation to just chuck it in must be incredibly strong.

Until recently – software development was seen as a fairly secure career choice, so people has a financial motivator to get into it – but still, anecdotally it seems people many times got into software development by accident. Had to edit a web page, and discovered javascript and PHP – or , had to do programming as part of some lab at university and quite enjoyed it et c. Some were trying to become real engineers but had to settle for software development, some were actuaries in insurance and ended up programming python for a living.

I worry that as the economic prospects of getting into the industry as a junior developer is eaten up by AI budgets, we will see a drop-off of those that accidentally end up in software development and we will be left with only the ones with what we could kindly call a “calling”, or what I would say “has no other marketable skills” like back in my day.

Dwindling power of coercion

Microsoft of course is the enemy of any right thinking 1337 h4xx0r, but there has been quite a while where if you wanted a Good Job, learning .NET and working for a large corporation on a Lenovo Thinkpad was the IT equivalent of working at a factory in the 1960s. Not super joyous but, a Good Job. You learned .NET 4.5 and you pretended to like it. WCF, BizTalk and all. The economic power was unrelenting.

Then the crazy web 2.0 happened and the cool kids were using Ruby on Rails. If you wanted to start using ruby, it was super easy. It was like back in my day, but instead of typing ABC80 basic – see below- they used the read evaluate print loop in ruby. Super friendly way of feeling like a genius and gradually increase the level of difficulty.

Meanwhile legacy Java and C# were very verbose, you had to explain things like static, class, void, include, static not to mention braces and semicolons et c to people before they could create a loop of a bad word filling the terminal.

People would rather still learn PHP or Ruby, because they saw no value in those old stodgy languages.

Oracle were too busy being in court suing people to notice, but on the JVM there were other attempts at creating some things less verbose – Scala and eventually Kotlin happened.

Eventually Microsoft noticed what was going on, and as the cool kids jumped ship from Ruby onto NodeJS, Microsoft were determined to not miss the boat this time, so they threw away the .NET Framework, or “threw away” – as much as Microsoft have ever broken with legacy, but still fairly backward compatible, and started from scratch with .NET Core and a renewed focus on performance and lowered barriers to entry.

The pressure really came as data science folks rediscovered Python. It too has super low barrier to entry, except there is a pipeline to data science, and Microsoft really failed to break into that market due to the continuous mismanagent of F#, except they attacked it form the Azure side and get the money that way – depite people writing python.

Their new ASP.NET Core web stack stole borrowed concepts like minimal API from Sinatra and Nancy, and they introduced top level statements to allow people to immediately get the satisfaction of creating a script that loops and emits rude words using only two lines of code

But still, the canonical way of writing this code was to install Visual Studio and create a New Project – Console App, and when you save that to disk you have a whole bunch of extra nonsense there (a csproj file, a bunch of editor metadata stuff that you do not want to have to explain to a n00b et cetera), which is not beginner friendly enough.

This past Wednesday, Microsoft introduced .NET 10 and Visual Studio 2026. In it, they have introduced file based apps, where you can write one file that can reference NuGet packages or other C# projects, import namespaces and declare build-time variables inline. It seems like an evolution of scriptcs, but slightly more complete. You can now give people a link to the SDK installer and then give them this to put in a file called file.cs:

Then, like in most programming tutorials out there you can tell them to do sudo chmod +x file.cs if they are running a unix like OS. In that case, the final step is ./file.cs and your rude word will fill the screen…

If you are safely on Windows, or if you don’t feel comfortable with chmod, you can just type dotnet file.cs and see the screen fill with creativity.

Conclusion

Is the bar low enough?

Well, if they are competing with PHP, yes, you can give half a page length’s instruction and get people going with C#, which is roughly what it takes to get going with any other language on Linux or Mac and definitely easier than setting up PHP. The difficulty with C# and with Python as well is that they are old. Googling will give you C# constructs from ages ago that may not translate well to a file based project world. Googling for help with Python will give you a mix of python 2 and python 3, and with python it is really hard to know what is a pip thing and what is an elaborate hoax due to the naming standards. The conclusion is therefore, dotnet is now in the same ballpark as the other ones in terms of complexity, but it depends on what resources remain available. Python has a whole gigantic world out there of “how to get started from 0”, whilst C# has a legacy of really bad code from the ASP.NET WebForms days. Microsoft have historically been excellent at providing documentation, so we shall see if their MVP/RD network flood the market with intro pages.

At the same time, Microsoft is going through yet another upheaval with Windows 10 going out of support and Microsoft tightening the noose around needing to have a Microsoft Account to run Windows 11, and at the same time Steam have released the Steam Console running Windows software on Linux, meaning people will have less forced exposure to Windows even to game, whilst Google own the school market. Microsoft will still have corporate environments that are locked to Windows for a while longer, but they are far from the situation they used to be in.

I don’t know if C# is now easy enough to adopt that people that are curious about learning programming would install it over anything else on their mac or linux box.

High or low bar, should people even learn to code?

Yes, some people are going to have to learn programming in the future. AGI is not happening, and new models can only train on what is out there. Today’s generative AI can do loads of things, but in order to develop the necessary skills to leverage it responsibly, you need to be familiar with all the baggage underneath or else you risk releasing software that is incredibly insecure or that will destroy customer data. Like Bjarne Stoustrup said “C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do it blows your whole leg off” – this can apply to AI generated code as well.

Why does code rot?

The dichotomy

In popular parlance you have two categories of code: your own, freshly written code, which is the best code, code that never will be problematic – and then there is legacy code, which is someone else’s code, untested, undocumented and awful. Code gradually goes from good to legacy in some ways that appear mystical, and in the end you change jobs or they bring in new guys to do a Great Rewrite with mixed results.

So, to paraphrase Baldrick from Blackadder Goes Forth: “The way I see it, these days there’s a [legacy code mess], right? and, ages ago, there wasn’t a [legacy code mess], right?    So, there must have been a moment when there not being a [legacy code mess] went away, right? and there being a [legacy code mess] came along.   So, what I want to know is: How did we get from the one case of affairs to the other case of affairs”

The hungry ostrich

Why does code start to deteriorate? What precipitates the degradation that eventually leads to terminal decline? What is the first bubble of rust appearing by the wheel arches? This is hard to generally state, but the causes I have personally seen over the years boil down to being prevented from making changes in a defensible amount of time.

Coupling via schema – explicit or not

E.g. it could be that you have another system accessing your storage directly. Doesn’t matter if you are using schemaless storage or not, as long as two different codebases need to make sense of the same data, you have a schema whether you admit it or not, at some point those systems will need to coordinate their changes to not break functionality.

Fundamentally – as soon as you start going “neah, I won’t remove/rename/change type of that old column because I have no idea who still uses it” you are in trouble. Each storage must have one service in front of it that owns it, so that it can safely manage schema migrations, and anyone wanting to access that data needs to use a well defined API to do so. The service maintainers can thereafter be held responsible to maintain this API in perpetuity, and easily so since the dependency is explicit and documented. If the other service just queried the storage directly, the maintainer is completely unaware (yes, this goes for BI teams as well).

Barnacles settling

If every feature request leads to functions and classes growing as new code is added like barnacles without regular refactoring to more effective patterns, the code gradually gets harder to change. This is commonly a side-effect of high turnover or outsourcing. The developers do not feel empowered to make structural changes, or perhaps have not had enough time to get acquainted with the architecture as it was intended at some point. Make sure that whomever maintains your legacy code is fully aware of their responsibility to refactor as they go along.

Test after

When interviewing engineers it is very common that they say they “practice TDD, but…”, meaning they test after. At least to me the difference in test quality is obviously different if I write the tests first versus if I get into the zone and write the feature first and then try to retrofit tests afterwards. Hint: there is usually a lot less mocking if you test first. As the tests get more complex, adding new code to a class under tests gets harder, and if the developer does not feel empowered to refactor first, the tests are likely to not cover the added functionality properly , so perhaps a complex integration test is modified to validate the new code, maybe the change is tested manually…

Failure to accept Conway’s law

The reason people got hyped about micro services was the idea that you could deploy individual features independently of the rest of the organisation and the rest of the code. This is lovely, as long as you do it right. You can also go too granular, but in my experience that rarely happens. The problem that does happen is that separate teams have interests in the same code and modify the same bits, and releases can’t go out without a lot of coordination. If you also have poor automation test coverage you will get a manual verification burden that further slows down releases. At your earliest convenience you must spend time restructuring your code or at least the ownership of it so that teams afully own all aspects of the thing they are responsible for and they can release code independently, and with any remaining cross-team dependencies made explicit and automatically verifiable.

Casual attitude towards breaking changes

If you have a monolith that is providing core features to your estate, and you have a publicly accessible API Operation, assume it is being used by somebody. Basically, if you must change its required parameters or its output, create a new versioned endpoint or one by a different name. Does this make things less messy? No, but at least you don’t break a consumer you don’t know about. Tech leads will hope that you message around to try and identify who uses it and coordinate a good outcome, but historically that seems too much to ask. We are only human after all.

Until you have PACT tests for everything, and solid coverage, never break a public method.

Outside of support horizon

Initially it does not seem that bad to be stuck with a slightly unsupported version of a library, but as time moves on, all of a sudden you get stuck for a week with a zero day bug that you can’t patch because three other libraries are out of date and contain breaking changes. It is much better if you are ready to make changes as you go along. One breaking change usually lets you have options, but when you are already exposed with a potential security breach, you have to make bad decisions due to lack of time.

Complex releases

Finally, it is worth mentioning that you want to avoid manual steps in your releases. Today there is really no excuse for making a release more complex than one button click. Ideally abstract away configuration so that there is no file.prod.config template that is separate from file.uat.config, or else that prod template file is almost guaranteed to break the release, much like the grille was the only thing rusting on the Rover 400 that was almost completely a Honda, (except for the grille).

Stopping Princip

So how do we avoid the decline, the rot? As with shifting quality and security left, it is much cheaper to address these problems the earlier you spot them, so if you find yourself in any of the situations above, address them with haste.

  1. Avoid engaging “maintenance developers”, their remit may explicitly mean they cannot do major refactoring even when necessary
  2. Keep assigning resources to keep dependencies updated. Use SAST to validate that your dependencies are not vulnerable.
  3. Disallow and remove integration-by-database at any cost. This is hard to fix, but worth it. This alone solves 90% of niggling small problems you are continuously having as you can fix your data structure to fit your current problems rather than the ones you had 15 years ago. If you cannot create a true data platform for reporting data, at least define agreed views/ indexes that can act like an interface for external consumers. That way you have a layer of abstraction between external consumers and yourself and stay free to refactor as long as you make sure that the views still work.
  4. Make dependencies explicit. Ideally PACT tests, but if not that, at least integration tests. This way you avoid needing shared integration environments where teams are shocked to first find out that their changes they have been working on for two weeks breaks some other piece of software they didn’t know existed.