Category Archives: Architecture

Barriers – trade and otherwise

The last few days of highly retro American foreign policy and the ensuing global market chaos has made everyone aware of trade barriers, and will make everyone fully grasp the consequences of a trade war, the same way the pandemic allowed us to re-learn the lessons of 1919 and the Spanish flu.

Background

To take our minds off the impending doom, I will instead address other barriers, ones that are more related to technology and software, but still cause real world problems for real people.

There are a number of popular websites that enumerate falsehoods programmers believe about names, addresses and even time zones. I do not have the stamina currently to produce an equivalent treatise on falsehoods people believe about residency, but I can manage to put together a bit of a moan on the subject at least.

Identity

In some countries, there are surrogate keys that can be used to identify a person. If these are ubiquitous, you come to rely on everyone having them and may require them as part of customer onboarding. Do remember that tourists exist. Or legal residents that somehow still does not have this magical number, In countries that have had reasons to become paranoid about dodgy people buying train tickets online such as Spain, they may demand an identity number. If you will be in such a situation, try to accept an alternative such as passport number.

In other countries, where there is no surrogate key – you are your address, essentially- you will instead need to consider how you handle, well tourists as well, again, but also students that may reside with their parents still. Also – make it easy to update people’s addresses as they inevitably have to move in our current neo-feudal society of landed gentry and renter class.

Residence vs Citizenship

In Europe at least it has been relatively common for people to live permanently in a country other than one of which they are a citizen. Many countries put up barriers to become a citizen to ensure that only those who care become naturalised citizens and vote on important matters. Other countries hand out citizenship in cereal boxes, like Sweden, but regardless – just because someone lives in a country and pays taxes there, does not mean they are a citizen of that country. If you ask for a “country”, please be explicit if you mean residency or nationality. Also, people do have multiple citizenships sometimes. Does this matter to you? Do you need this information? Really? There are a lot of data privacy concerns here, also please consider purpose and retention of this data.

If Her Majesty (RIP) bestowed a Settled Status upon you as an EU citizen and UK resident, you will not be eligible for a biometric residency card from the Home Office. This is still a surprise to many embassies and IT systems the world over. And no, you can’t blag one from the Home Office. The IT system or embassy will need to deal with share codes from the Settlement Scheme that prove right to live and work in Blighty. Again, pay attention.

If you are in the business of selling international travel, you may need to support edge cases. People with dual citizenship are already required to show the correct passport at the correct gate, i.e. at St Pancras they show the UK border agent their British passport and as they walk across to the French border agent they show their EU passport.

With the UK introducing ETA it becomes more complicated. The current recommendation when booking travel is to use the passport for your country of residence, as it is easier for the airline / ferry / train company to know you have right to reside in your home country, but with ETA that becomes even more important, because if a dual passport holder shows up at say a foreign airport trying to check in to a flight to the UK booked on their non-UK passport, the airline is required to demand an ETA to allow boarding. Even for EU citizens with settled status this can be a problem, but for us we can still show a share code from the Home Office, but no such route exists for a citizen.

Things really kick off when the EU introduces ETIAS, when you as a dual citizen need to prove your right both ways, i.e. you need to book your EU bound flight with your EU passport and you return journey with your UK passport. British Airways and EasyJet already support having different identity documents for each leg, but Ryanair does not. I mention it here not to promote alternatives to Ryanair, but just to warn that this is something someone writing / selling software that deals with this area will have to consider.

As KYC becomes a bigger deal, and countries invent new barriers to live and work in other countries, you as – let’s say- the vendor or implementer of systems involved in recruitment will have to pay attention to how things really work to ensure you correctly gate keep your users so that you only refer people that can legally work in the jurisdiction you are serving. This seems onerous, so you will immediately partner with an identity and verification firm that deals with that for you. Great – as long as that supplier is up-to-date.

Payments

Unfortunately the very real threat of terror attacks and the less gruesome but more prevalent risk of fraud have meant that what used to be a stupid simple experience – buying things online – has become more complex.

Here you are usually bound by what measures your financial rails or your regulator put in place to reduce risk, but at least consider if you need to require the card to be registered to the shipping address, or if you as a UK company really require bank account and sort code to enter payment details as these are not printed on foreign debit cards. You will need it for direct debit of course, but always? Think about it, is all I ask.

I tried to buy car insurance when I was new in the UK, and I had to go to a brick and mortar insurer so that I could pay with my Swedish visa debit, as all online businesses required UK debit cards. it is hard to get a bank account as a n00b in any country, and that degree of difficulty helps other businesses. If you can show a bank statement mailed to your address, that’s prime identity document stuff in the UK.

Why all of this friction? Why?

Surely, someone has solved all of this. Yes, but I have never been a resident of Estonia, so I cannot tell you how utopia works.

I can warn you of what has happened in Sweden. Everyone – well, not everyone, but most people – has an electronic certificate called Bank ID. You can use it to pay, to do your taxes, to loan money, to buy a house, to start a business. Anything. In the high trust society of old you could just punch in your surrogate key, the personnummer, and you would without any further verification be subscribed to a newspaper. Both the newspaper and the invoice would arrive at your registered address. Yes the state would tell the newspaper where you live.

With BankID, you will now get a follow-up question and need to verify in an app. So people are creating shell companies, taking up loans, buying property and transferring funds on behalf of other people and using social engineering to make sure the app gets clicked.

There are calls for slowing down bureaucracy, making people show up in person to do some of this stuff due to the proliferation of fake identities issued through misuse of proper channels, hijacked cryptographic identities and malevolent automation. Some old fashioned bureaucracy can help reduce fraud by stalling, basically. Consider abuse as you design systems.

How small is small?

I have great respect for the professional agile coach and scrum master community. Few people seem to systematically care for both humans and business, maintaining profitability without ever sacrificing the humans. Now, however, I will alienate vast swathes of them in one post. Hold on.

What is work in software development?

Most mature teams do two types of work, they look after a system and make small changes to it – maintenance, keeping the lights on and new features that the business claims it wants. It is common to get an army in to build a new platform and then allow the teams to naturally attrit as transformational project works fizzles out, contractors leave, the most marketable developers either get promoted out of the team or get better offers elsewhere. A small stream of fine adjustment changes keep coming in to the core team of maintenance developers – effectively – that remains. Eventually this maintenance development work gets outsourced abroad.

A better way is to have teams of people that work together all day every day. Don’t expand and contract or otherwise mess with teams, hire carefully from the beginning and keep new work flowing into teams rather than restructuring after a piece of work is complete. Contract experts to pair with the existing team if you need to tech the team new technology, but don’t get mercenaries to do work. It might be slower, but if you have a good team that you treat well, odds are better they’ll stay, and they will develop new features to be able to be maintained better in the future and less likely to cut corners as any shortcuts will blow up in their own faces shortly later.

Why do we plan work?

When companies spend money on custom software development, a set of managers at very high positions within the organisation have decided that investing in custom software is a competitive advantage, and several other managers think they are crazy to spend all this money on IT.

To mollify the greater organisation, there is some financial oversight and budgeting. Easily communicated projects are sold to the business “we’ll put a McGuffin in the app”, “we’ll sprinkle some AI on it” or similar, and hopefully there is enough money in there to also do a bit of refactoring on the sly.

This pot of money is finite, so there is strong pressure to keep costs under control, don’t get any surprise AWS bills or middle managers will have to move on. Cost runaway kills companies, so there are legitimately people not sleeping at night when there are big projects in play.

How do we plan?

Problem statement

Software development is very different from real work. If you build a physical thing, anything from a phone to a house, you can make good use of a detailed drawing describing exactly how the thing is constructed and the exact properties of the components that are needed. If you are to make changes or maintain it, you need these specifications. It is useful both for construction and maintenance.

If you write the exact same piece of software twice, you have some kind of compulsive issue, you need help. The operating system comes with commands to duplicate files. Or you could run the compiler twice. There are infinite ways of building the exact same piece of software. You don’t need a programmer to do that, it’s pointless. A piece of software is not a physical thing.

Things change, a lot. Fundamentally – people don’t know what they want until they see it, so even if you did not have problems with technology changing underneath your feet whilst developing software, you would still have problems with the fact that fundamentally people did not know what they wanted back when they asked you to build something.

The big issues though is technology change. Back in the day, computer manufacturers would have the audacity to evolve the hardware in ways that made you have to re-learn how to write code. High level languages came along and now instead we live with Microsoft UI frameworks or Javascript frameworks that are mandatory one day and obsolete the next. Things change.

How do you ever successfully plan to build software, then? Well… we have tried to figure that out for seven decades. The best general concept we have arrived at so far is iteration, i.e. deliver small chunks over time rather than to try and deliver all of it at once.

The wrong way

One of the most well-known but misunderstood papers is Managing The Development of Large Software Systems by Dr Winston W Royce1 that launched the concept Waterfall.

Basically, the software development process in waterfall is outlined into distinct phases:

  1. System requirements
  2. Software requirements
  3. Analysis
  4. Program design
  5. Coding
  6. Testing
  7. Operations

For some reason people took this as gospel for several decades, despite the core, fundamental problem that dooms the process to failure is outlined right below figure 2 – the pretty waterfall illustration of the phases above – that people keep referring to, it says:

I believe in this concept, but the implementation described above is risky and invites failure. The
problem is illustrated in Figure 4. The testing phase which occurs at the end of the development cycle is the
first event for which timing, storage, input/output transfers, etc., are experienced as distinguished from
analyzed. These phenomena are not precisely analyzable. They are not the solutions to the standard partial
differential equations of mathematical physics for instance. Yet if these phenomena fail to satisfy the various
external constraints, then invariably a major redesign is required. A simple octal patch or redo of some isolated
code will not fix these kinds of difficulties. The required design changes are likely to be so disruptive that the
software requirements upon which the design is based and which provides the rationale for everything are
violated. Either the requirements must be modified, or a substantial change in the design is required. In effect
the development process has returned to the origin and one can expect up to a lO0-percent overrun in schedule
and/or costs.

Managing The Development of Large Software Systems, Dr Winston W Royce

Reading further, Royce realises that a more iterative approach is necessary as pure waterfall is impossible in practice. His legacy however was not that.

Another wrong way – RUP

Rational Rose and the Rational Unified Process was the Chat GPT of the late nineties, early noughties. Basically, if you only would make an UML drawing in Rational Rose, it would give you a C++ program that executed. It was magical. Before PRINCE2 and SAFe, everyone was RUP certified. You had loads of planning meetings, wrote elaborate Use Cases on index cards, and eventually you had code. It sounds like waterfall with better tooling.

Agile

People realised that when things are constantly changing, it was doomed to have a fixed plan to start with and to stay on it even when you knew that it was unattainable or undesirable to reach the original goal. Loads of attempts were made, but one day some people got together to actually have a proper go at defining what should be the true way going forward.

In February 11-13, 2001, at The Lodge at Snowbird ski resort in the Wasatch mountains of Utah, seventeen people met to talk, ski, relax, and try to find common ground—and of course, to eat. What emerged was the Agile ‘Software Development’ Manifesto. Representatives from Extreme Programming, SCRUM, DSDM, Adaptive Software Development, Crystal, Feature-Driven Development, Pragmatic Programming, and others sympathetic to the need for an alternative to documentation driven, heavyweight software development processes convened.

History: The Agile Manifesto

So – everybody did that, and we all lived happily ever after?

Short answer: No. You don’t get to just spend cash, i.e. have developer do work, without making it clear what you are spending it on, why, and how you intend to know that it worked. Completely unacceptable, people thought.

The origins of tribalism within IT departments have been done to death in this blog alone, so for once it will not be rehashed. Suffice to say, organisationally often staff is organised according to their speciality rather than in teams that produce output together. Budgeting is complex, there can be political competition that is counter productive to IT as a whole or for the organisation as a whole.

Attempts at running a midsize to large IT department that develops custom software have been made in form of Scaled Agile Framework (SAFe), DevOps and SRE (where SRE is addressing the problem backwards, from running black-box software using monitoring, alerts, metrics and tracing to ensure operability and reliability of the software).

As part of some of the original frameworks that came in with the Agile Manifesto, a bunch of practices became part of Agile even though they were not “canon”, such as User Stories, that were said to be a few words on an index card, pinned to a noticeboard in the team office, just wordy enough to help you discuss a problem directly with your user. This of course eventually started to develop back into the verbose RUP Use Cases from yesteryear, but “agile, because they are in Jira”, and rules had to be created for the minimum amount of information on there to successfully deliver a feature. In the Toyota Production System that originated Scrum, Lean Software Development and Six Sigma (sadly, an antipattern), one of the key the lessons is The ideal batch size is 1, and generally making smaller changes. This explosion in size of the user story is symptomatic of the remaining problems in modern software development.

Current state of affairs

So what do we do

As you can surmise if you read the previous paragraphs, we did not fix it for everybody, we still struggle to reliably make software.

The story and its size problems

The part of this blog post that will alienate the agile community is coming up. The units of work are too big. You can’t release something that is not a feature. Something smaller than a feature has no value.

If you work next to a normal human user, and they say – to offer an example – “we keep accidentally clicking on this button, so we end up sending a message to the customer too early, we are actually just trying to get to this area here to double-check before sending”, you can collaboratively determine the correct behaviour, make it happen, release in one day, and it is a testable and demoable feature.

Unfortunately requirements tend to be much bigger and less customer facing. Like, department X want to start seeing the reasons for turning down customer requests in their BI tooling being a feature, and then a “product backlog item” could be service A and service B needs to post messages on a message bus in various positions of the user flow identifying reasons.

Iterating over and successfully releasing this style of feature to production is hard.

Years ago I saw Allen Holub speaking on SD&D in London and his approach to software development is very pure. It is both depressing and enlightening to read the flamewars that erupt in his mentions on Twitter when he explains how to successfully make and release small changes. People scream and shout that it is not possible to do it his way.

In the years since, I have come to realise that nothing is more important than making smaller units of work. We need to make smaller changes. Everything gets better if / when we succeed. It requires a mindset shift, a move away from big detailed backlogs to smaller changes, discussed directly with the customer (in the XP sense, probably some other person in the business, or another development team). To combat the uncertainty, it is possible to mandate some kind of documentation update (graph? chart?) as part of the definition of done. Yes, needless documentation is waste, but if we need to keep a map over how the software is built, as long as people actually consult it, it is useful. We don’t need any further artefacts of the story once the feature is live in production anyway.

How do we make smaller stories?

This is the challenge for our experts in agile software development. Teach us, be bothered, ignore the sighs of developers that still do not understand, the ones raging in Allen Holub’s mentions. I promise, they will understand when they see it first hand. Daily releases of bug free code. They think people are lying to them when they hear us talk about it. When they experience it though, they will love it.

When every day means a new story in production, you also get predictability. As soon as you are able to split incoming or proposed work into daily chunks, you also get the ability to forecast – roughly, better than most other forms of estimate – and since you deliver the most important new thing every day, you give the illusion of value back to those that pay your salary.

Abstractions, abstractions everywhere

X, X everywhere meme template, licensed by imgflip.com

All work in software engineering is about abstractions.

Abstractions

All models are wrong, but some are useful

George Box

It began with Assembly language, when people were tired of writing large-for-its-time programs in raw binary instructions, they made a language that basically mapped each binary instruction to a text value, and then there was an app that would translate that to raw binary and print punch cards. Not a huge abstraction, but it started there. Then came high level languages and off we went. Now we can conjure virtual hardware out of thin air with regular programming languages.

The magic of abstractions, it really gives you an amazing leverage, but at the same time you sacrifice actual knowledge of the implementation details, meaning you often get exposed to obscure errors that you either have no idea what they mean, or even worse- understand exactly what’s wrong but you don’t have access to make that change because the source is just a machine translated piece of Go, and there is no way to fix the translated C# directly, just to take one example.

Granularity and collaboration across an organisation

Abstractions in code

Starting small

Most systems start small, solving a specific problem. This is done well, and the requirements grow, people begin to understand what is possible and features accrue. A monolith is built, and it is useful. For a while things will be excellent and features will be added at great speed, and developers might be added along the way.

A complex system that works is invariably found to have evolved from a simple system that worked

John Gall

Things take a turn

Usually, at some point some things go wrong – or auditors get involved because regulatory compliance – and you prevent developers from deploying to production, hiring gate keepers to protect the company from the developers. In the olden days – hopefully not anymore – you hire testers to do manual testing to cover a shortfall in automated testing. Now you have a couple of hand-offs within the team, meaning people write code, give it to testers who find bugs, work goes the wrong way – backwards – for developers to clean up their own mess and to try again. Eventually something will be available to release, and the gate keepers will grudgingly allow a change to happen, at some point.

This leads to a slow down in the feature factory, some old design choices may cause problems that further slow down the pace of change, or – if you’re lucky – you just have too many developers in one team, and you somehow have to split them up in different teams, which means comms deteriorate and collaborating in one codebase becomes even harder. With the existing change prevention, struggles with quality and now poor cross-team communication, something has to be done to clear a path so that the two groups of people can collaborate effectively.

Separation of concerns

So what do we do? Well, every change needs to be covered by some kind of automated test, if only to at first guarantee that you aren’t making things worse. This way you can now refactor the codebase to a point where the two groups can have separate responsibilities, and collaborate over well defined API boundaries, for instance. Separate deployable units, so that teams are free to deploy according to their own schedule.

If we can get better collaboration with early test designs and front-load test automation, and befriend the ops gatekeepers to wire in monitoring so that teams are fully wired in to how their products behave in the live environment, we would be close to optimum.

Unfortunately – this is very difficult. Taking a pile of software and making sense of it, deciding how to split it up between teams, gradually separating out features can be too daunting to really get started. You don’t want to break anything, and if you – as many are won’t to do, especially if you are new in an organisation – decide to start over from scratch, you may run into one or more of the problems that occur when attempting a rewrite. One example being where you end up in a competition against a moving target. The same team has to own a feature in both the old and the new codebase, in that case, to stop that competition. For some companies it is simply worth the risk, they are aware they are wasting enormous sums of money, but they still accept the cost. You would have to be very brave.

Abstractions in Infrastructure

From FTP-from-within-the-editor to Cloud native IaC

When software is being deployed – and I am ignoring native apps now, largely, and focusing on web applications and APIs- there are a number of things that are actually happening that are at this point completely obscured by layers of abstraction.

The metal

The hardware needs to exist. This used to be a very physical thing, a brand new HP ProLiant howling in the corner of the office onto which you installed a server OS and set up networking so that you could deploy software on it, before plugging it into a rack somewhere, probably a cupboard – hopefully with cooling and UPS. Then VM hosts became a thing, so you provisioned apps using VMWare or similar and got to be surprised at how expensive enterprise storage is per GB compared to commodity hardware. This could be done via VMWare CLI, but most likely an ops person pointed and clicked.

Deploying software

Once the VM was provisioned, things like Ansible, Chef and Puppet began to become a thing, abstracting away the stopping of websites, the copying of zip files, the unzipping, the rewriting configuration and the restarting of the web app into a neat little script. Already here you are seeing problems where “normal” problems, like a file being locked by a running process, show up as a very cryptic error message that the developer might not understand. You start to see cargo cult where people blindly copy things from one app to another because you think two services are the same, and people don’t understand the details. Most of the time that’s fine, but it can also be problematic with a bit of bad luck.

Somebody else’s computer

Then cloud came, and all of a sudden you did not need to buy a server up front and instead rent as much server as you need. Initially, all you had was VMs, so your Chef/Puppet/Ansible worked pretty much the same as before, and each cloud provider offered a different was of provisioning virtual hardware before you came to the point where the software deployment mechanism came into play. More abstractions to fundamentally do the same thing. Harder to analyse any failures, you some times have to dig out a virtual console to just see why/how an app is failing because it’s not even writing logs. Abstractions may exist, but they often leak.

Works on my machine-as-a-service

Just like the London Pool and the Docklands were rendered derelict by containerisation, a lot of people’s accumulated skills in Chef and Ansible have been rendered obsolete as app deployments have become smaller, each app simply unzipped on top of a brand new Linux OS sprinkled with some configuration answer, and then have the image pushed to a registry somewhere. On one hand, it’s very easy. If you can build the image and run the container locally, it will work in the cloud (provided the correct access is provisioned, but at least AWS offer a fake service that let’s you dry run the app on your own machine and test various role assignments to make sure IAM is also correctly set up. On the other hand, somehow the “metal” is locked away even further and you cannot really access a console anymore, just a focused log viewer that let’s you see only events related to your ECS task, for instance.

Abstractions in Organisations

The above tales of ops vs test vs dev illustrates the problem of structuring an organisation incorrectly. If you structure it per function you get warring tribes and very little progress because one team doesn’t want any change at all in order to maintain stability, the other one gets held responsible for every problem customers encounter and the third one just wants to add features. If you structured the organisation for business outcome, everyone would be on the same team working towards the same goals with different skill sets, so the way you think of the boxes in an org chart can have a massive impact on real world performance.

There are no solutions, only trade-offs, so consider the effects of sprinkling people of various background across the organisation, if instead of being kept in the cellar as usual you start proliferating your developers among the general population of the organisation, how do you ensure that every team follows the agreed best practices, that no corners are cut even when a non-technical manager is demanding answers. How do you manage performance of developers you have to go out of your way to see? I argue such things are solvable problems, but do ask your doctor if reverse Conway is right for you.

Conclusion

What is a good abstraction?

Coupling vs Cohesion

If a team can do all of their day-to-day work without waiting for another team to deliver something or approve something, if there are no hand-offs, then they have good cohesion. All the things needed are to hand. If the rest of the organisation understands what this team does and there is no confusion about which team to go to with this type of work, then you have high cohesion. It is a good thing.

If however, one team constantly is worrying about what another team is doing, where certain tickets are in their sprint in order to schedule their own work, then you have high coupling and time is wasted. Some work has to be moved between teams or the interface between the teams has to be made more explicit in order to reduce this interdependency.

In Infrastructure, you want the virtual resources associated with one application to be managed within the same repository/area to offer locality and ease of change for the team.

Single Responsibility Principle

While dangerous to over-apply within software development (you get more coupling than cohesion if you are too zealous), this principle is generally useful within architecture and infrastructure.

Originally meaning that one class / method should only do one thing – an extrapolation of the UNIX principles – it can more generally be said to mean that on that layer of abstraction, a team, infrastructure pipe, app, program, class […] should have one responsibility. This usually mean a couple of things happen, but they conceptually belong together. They have the same reason to change.

What – if any – pitfalls exist ?

The major weakness of most abstractions is when they fall apart, when they leak. Not having access to a physical computer is fine, as long as the deployment pipeline is working, as long as the observability is wired up correctly, but when it falls down, you still need to be able to see console output, you need to understand how networking works, to some extent, you need to understand what obscure operating system errors mean. Basically when things go really wrong you are needed to have already learned to run that app in that operating system before, so you recognise the error messages and have some troubleshooting steps memorised.
So although we try and save our colleagues from the cognitive load of having to know everything we were forced to learn over the decades, to spare them the heartache, they still need to know. All of it. So yes, the danger with the proliferation of layers of abstraction is to pick the correct ones, and to try and keep the total bundle of layers as lean as possible because otherwise someone will want to simplify or clarify these abstractions by adding another layer on top, and the cycle begins again.

Technical debt – the truth

Rationale

Within software engineering we often talk about Technical Debt. It was defined by elder programmer and agilist Ward Cunningham and likens trade offs made when designing and developing software to credit you can take on, but that has to be repaid eventually. The comparison further correctly implies that you have to service your credit every time you make new changes to the code, and that the interest compounds over time. Some companies literally go bankrupt over unmanageable technical debt – because the cost of development goes up as speed of delivery plummets, and whatever use the software was intended to have is eventually completely covered by a competitor yet unburden by technical debt.

But what do you mean technical debt?

The decision to take a shortcut can be a couple of things. Usually amount of features, or time spent per feature. It could mean postponing required features to meet a deadline/milestone, despite it taking longer to circle back and do the work later in the process. If there is no cost to this scheduling change, it’s just good planning. For it to be defined as technical debt there has to have been a cost associated with the rescheduling.

It is also possible to sacrifice quality to meet a deadline. “Let’s not apply test driven development because it takes longer, we can write tests after the feature code instead”. That would mean that instead of iteratively writing a failing tests first followed by the feature code that makes that test pass, we will get into a state of flow and churn out code as we solve the problem in varying levels of abstraction and retrofit tests as we deem necessary. Feels fast, but the tests get big, incomplete, unwieldy and brittle compared to the plentiful small all-encompassing, and specific tests TDD bring. A debt you are taking on to be paid later.

A third kind of technical debt – which I suspect is the most common – is also the one that fits the comparison to financial debt the least. A common way to cut corners is to not continuously maintain your code as it evolves over time. It’s more akin to the cost of not looking after your house as it is attacked by weather and nature, more dereliction than anything else really.

Let’s say your business had a physical product it would sell back when a certain piece of software was written. Now the product sold is essentially a digital license of some kind, but in the source code you still have inventory, shipping et cetera that has been modified to handle selling a digital product in a way that kind of works, but every time you introduce a new type of digital product you have to write further hacks to make it appear like a physical product as far as the system knows.

The correct way to deal with this would have been to make a more fundamental change the first time digital products were introduced. Maybe copy the physical process at first and cut things out that don’t make sense whilst you determine how digital products work, gradually refactoring the code as you learn.

Interest

What does compound interest mean in the context of technical debt? Let’s say you have created a piece of software, your initial tech debt in this story is you are thin on unit tests but have tried to compensate by making more elaborate integration tests. So let’s say the time comes to add an integration, let’s say a json payload needs to be posted to a third party service over HTTP with a bespoke authentication behaviour.

If you had applied TDD, you would most likely have a fairly solid abstraction over the rest payload, so that an integration test could be simple and small.

But in our hypothetical you have less than ideal test coverage, so you need to write a fairly elaborate integration test that needs to verify parts of the surrounding feature along with the integration itself to truly know the integration works.

Like with some credit cards, you have two options on your hypothetical tech debt statement, either build the elaborate new integration test at a significant cost – a day? Three? Or you avert your eyes and choose the second – smaller- amount and increase your tech debt principal by not writing an automated test at all and vow to test this area of code by hand every time you make a change. The technical debt equivalent of a payday loan.

Critique

So what’s wrong with this perfect description of engineering trade offs? We addressed above how a common type of debt doesn’t fit the debt model very neatly, which is one issue, but I think the bigger problem is – to the business we just sound like cowboy builders.

Would you accept that a builder under-specified a steel beam for an extension you are having built? “It’s cheaper and although it is not up to code, it’ll still take the weight of the side of the house and your kids and a few of their friends. Don’t worry about it.“ No, right? Or an electrician getting creative with the earthing of the power shower as it’s Friday afternoon, and he had promised to be done by now. Heck no, yes?

The difference of course is that within programming there is no equivalent of a GasSafe registry, no NICEIC et cetera. There are no safety regulations for how you write code, yet.

This means some people will offer harmful ways of cutting corners to people that don’t have the context to know the true cost of the technical debt involved.

We will complain that product owners are unwilling to spend budget on necessary technical work, so as to blame product rather than take some responsibility. The business expects us to flag up if there are problems. Refactoring as we go, upgrading third party dependencies as we go should not be something the business has to care about. Just add it to the tickets, cost of doing business.

Sure there are big singular incidents such as a form of authentication being decommissioned or a framework being sunset that will require big coordinated change involving product, but usually those changes aren’t that hard to sell to the business. It is unpleasant but the business can understand this type of work being necessary.

The stuff that is hard to sell is bunched up refactorings you should have done along the way over time, but you didn’t- and now you want to do them because it’s starting to hurt. Tech debt amortisation is very hard to sell, because things are not totally broken now, why do we have to eat the cost of this massive ticket when everything works and is making money? Are you sure you aren’t just trying to gold plate something just out of vanity? The budget is finite and product has other things on their mind to deal with. Leave it for now, we’ll come back to it (when it’s already fallen over).

The business expects you to write code that is reliable, performant and maintainable. Even if you warn them you are offering to cut corners at the expense of future speed of execution, a non-developer may have no idea of the scale of the implications of what you are offering .

If they spent a big chunk out of their budget one year – the equivalent of a new house in a good neighbourhood – so that a bunch of people could build a piece of software with the hope that this brand new widget in a website or new line-of-business app will bring increased profits over the coming years, they don’t want to hear roughly the same group of people refer to it as “legacy code” already at the end of the following financial year.

Alternative

Think of your practices as regulations that you simply cannot violate. Stop offering solutions that involve sacrificing quality! Please even.

We are told that making an elaborate big-design-upfront is waterfall and bad – but how about some-design-upfront? Just enough thinking ahead to decide where to extend existing functionality and where to instead put a fork in the road and begin a more separate flow in the code, that you then develop iteratively.

If you have to make the bottom line more appealing to the stakeholders for them to dare invest in making new product through you and not through dubious shadow-IT, try and figure out a way to start smaller and deliver value sooner rather than tricking yourself into accepting work that you cannot possibly deliver safely and responsibly.

Size matters

When I started out, I had no idea about lean, agile or TDD. I did read a lot about Object Oriented Programming. I even thought I was practicing it. My first decade of programming was not what would pass for professional grade by modern standards, but even my poor standard back then was – and still would be – not the worst. Sadly.

Eventually I read some pamphlets about TDD, tried some katas, read books about lean software development and the book the Phoenix Project and was subjected to Fred George‘s Object Bootcamp with my co-workers a couple of employers ago.

Then I understood. All of it. Object oriented programming, TDD, lean. All of it. Because of how steeped in my old bad habits I am it still takes conscious effort to practice all these things, and I fail to enforce it sometimes, but at least at this point I know what good looks like. If I had stronger discipline I could refactor towards it as often as I like.

Like, if you make small enough focused classes, object orientation makes sense and you find yourself binning and replacing classes that are no longer fit-for-purpose rather than rewriting them. Open-Closed principle. There are all of a sudden more value objects and composable domain classes that cleanly describe domain behaviour and you come away from cascading changes that you thought were inherent in OO. But is it easy? No. It requires constant effort to maintain.

The most important thing is size. Make the smallest possible change to make the test go green. People – me, you – we? – overcomplicate things and fail to comprehend just how small small should be and in the refactor step, we rarely refactor enough and leave classes too big. Now I find this really hard, the commit-by-commit design step where you are supposed to refactor code into proper shape, reduce classes down to their most composable form, their bare essence. On the other hand, “It’s your only job, Rachel!” like Jimmy Carr says on Eight out of Ten Cats Does Countdown when she can’t solve a numbers game right away. We are paid not only to sit in front of our computers and type but also to think. If we do a bit more of the thinking and a little bit less of the typing, nobody is going to get upset.

Allen Holub talks about user stories on Twitter, reminding us that a story isn’t a neologism for “requirement” and that a user story literally means that, a software user telling us a story about a domain problem they are having. Not “as a user I want to authenticate so that I can have access to the system” but like actual meat on the bone. “As an underwriter I want to bind a quote as I have agreed terms with the broker”. Requirements for authentication will come as part of some story, but start stupid simple with a .htaccess file or similar. For some subsystems that requirement will never come, or can be trivially covered with infrastructure, and you just saved yourself a bunch of maintenance. Code is not an asset, it’s a liability.

Allen Holub also uses TDD when designing systems, as in uses Java and jUnit to TDD integrations between microservices like a less-trendy jupyter notebook proving his overall system design, even ending up with some code that could describe integration points. Start smaller. No, even smaller than that.

Then we have the No Estimates crowd. Seems insane to people used to the widespread non-story user stories with large blocks of detailed requirements that necessitates complex up-front engineering. How could teams take on these big stories without estimating how much they will take to build – what if we spend three months and we don’t even achieve what we want?!

That’s right. But – do you remember Agile? Yea, see the point was to build the smallest possible implementation that can prove viability. If your stories are sized consistently and correctly – i.e. smaller than you think possible – the consistent implementation work will give you the foresight you need to coordinate and plan as necessary.

We can then add features incrementally as users examine the existing product and realise the potential and get ideas based on their experience as domain experts for what they could do to meet their customers’ needs, or what experiments could be run to determine customer desires. In small increments still.

Even if you insist on keeping creating your estimates, you will find that making stories smaller will improve delivery quality and adherence to timescales. As soon as stories get too big, they are harder to reason about and you may even forget acceptance criteria you thought you had clear in your head. You will never regret making stories smaller.

SD&D 2022

I had the chance to attend the Software Design & Development conference at the Barbican Centre this week. It was my first conference since the plague so it was a new experience. I will here coalesce some of the main points that I gathered. I may go into specifics in a couple of topics that stood out to me, but this is mostly so that I have some notes to aid my memory.

Overall

The conference is very well organised, you can tell it’s not their first rodeo. From a practicality point of view the Barbican is equally easy/cumbersome to get to from all directions, which is about as good as you can get in London where usually you will end up favouring proximity to a subset of main line termini, thus making the location cumbersome to get to for at least half the population (since there are airports in every direction out from the city). There were a number of timeslots where you truly had FOMO for choosing one track over another, which is a design goal of a program committee – so, well done SD&D! There was some unfortunate setting in the AV equipment which interfered with shortcuts in Visual Studio in interesting ways, but surely that can be addressed somehow.

Kevlin Henney

The first law of developer conferences states Always Catch a Talk by Kevlin Henney if Available. It doesn’t teach you anything about a specific new thing, but it puts the entire universe into context, and always inspires you to go ahead and look up papers from the olden days that describe modern phenomena.

C# features

My biggest bafflement came not from the talks that showcased new C# features, such as the latest iteration of pattern matching and the like – as I usually get introduced to them when they show up as options in ReSharper – but by the old abomination that is default implementations on interfaces in C#8. This talk by Jeremy Clark was an eye opener. It is so jank you wonder how it could ever be released into production. My guess is that the compatibility argument from MAUI was the big reason, and it makes sense, but basically my instinct to stay away from it was right, but my guess is that you will have weird bugs because of it at some point in the future.

I also caught the C# Channels talk. I seem to recall the gestation of Channels, as if I remember correctly it was publicly brought into being through discussions on David Fowler’s twitter account (unless I misremember). Anyway, it is a highly civilised way of communicating between two async tasks with back pressure. Intuitive to use and safe. Like a concurrent queue but a lot nicer.

Micro services

Allen Holub had a number of talks on Microservices and if you can I’m sure you should see more of them, due to the abundance of other brilliant talks I only caught his test driven architecture talk which I will mention below. Other than him though there were talks by Juval Löwy and Neal Ford that covered architecture and micro services. I caught Software architecture foundations: identifying characteristics by Neal Ford and Sander Hoogendoorns’s talk on migrating to microservices in small steps, both worth watching.

Security

There were a couple of security talks focussing on automating security as well as the updated OWASP top 10 chart of threats as well as a couple of talks by Scott Brady about – I am guessing – – Identity Server. The dizzying array of choices meant I didn’t go to the identity server talks, but they are definitely on my to watch-list if there is ever video from this.

The security automation ones though Continuous Security by Kim van Wilgen and Add Security into your Agile Process by Cecilia Wirén took you through what you need to really improve the security posture of your development process. Kim v Wilgen was more in-depth on tooling whilst Cecilia Wiren was more about the whole process and what to consider where. The only sustainable way forward is to automate things like dependency vetting/tracking and as much static code analyisis you can, to bring these concerns as far left as you can, menaing early in the process. Rewriting a function even before you commit it is easy. Having an automated fuzzing tool discover you have a buffer overrun vulnerability when you thought you were done is worse from a cost-of-remediation point of view, but of course letting a bad vulnerability out into the wild is an order of magnitude worse, so whilst catching them in the dev cycle is preferable, attempting to catch stuff late through more heavy handed automation that is too time consuming to run on every commit is still worth considering running periodically as it’s better than the alternative.

Tests drive everything

How to stop testing and break your codebase by Clare Sudbery was an amazing talk that really hit home. It was like an experiment of what if I just skip test-first for a bit and see what happens? brought out of time crunch, tiredness and a sense that it would be a safe trade-off because I know what I’m doing and – in her case – I have acceptance tests . Like I have noticed in various side projects, when you let go of the discipline the drawbacks come at you hard and fast – almost cartoonishly so. It was one of the most relatable talks I have ever attended.

Allen Holub’s DbC (Design by Coding):applying TDD principles to architecture was fascinating. It started out by being a bit “old man yells at clouds” about how real agile is index cards on a board, not Jira, and although yes, index cards or post-it’s on a physical board is preferable to an electronic board, the electronic board prevents me from having to commute four days out of the five, so – no. The points raised about authoring million detailed tickets ahead of time being a waste though, hard agree, and the rest of the diatribe against big design up-front I was all aboard with and probably to some extent already doing. The interesting bit was to come though.
He presented a hypothetical technical problem that needed architecting. Instead of drawing a diagram or writing specs, he whipped out an editor with his favourite version of junit and wrote some java code through TDD – all in one file – that implemented tests and classes that symbolised microservices and their endpoints, as well as the interactions between them. Light weight, easy to read for developers, pleasant tooling and at least as useful as a diagram. In both cases you start writing the code from scratch, but you have a design document that makes sense. I’m not 100% sold on the concept but it’s worth considering.

Various highlights

Tuesday morning keynote was about quantum computing, and reluctantly I must concede that it probably is the future, and the community seems to be looking for converts, but I just can’t go from quantum entanglement to taking data from a web page and shoving it into a database, so I guess I have to wait those three years before it hits the mainstream and it will become digestable for the likes of me. If you are into cryptography as in encryption, not the various ponzi schemes, you should probably get into it now.

I caught a Kate Gregory talk – on naming – after only being a fan off of her YouTube talks on C++ vs C, definitely worth seeing. She proposed the strategy of “just mash the keyboard if you can’t think of a name and go back to it later” was also brought forward elsewhere on the conference, the point being: don’t get stuck trying to come up with a name, start writing the code and as you start talking about what the thing you just wrote is and what it is responsible for, a great name will eventually become evident and then you use that. The effort of coming up with a good name is worth it, and with refactoring tools it’s worth just moving past the instant rather than making a bad decision. Once you have finished the feature and proceeded, the caveats and difference between the name and the actual implementation will fade into obscurity and when you come back to the same code in three weeks you will have forgotten all about it and can be misled by bad names as easily as if you hadn’t written the code yourself.

Conclusion

I really enjoyed this conference. Again, with a past in a program committee I really admired the work they put in to cause so much anxiety when picking talks. Obviously the QE2 conference centre in Westminster is newer so NDC London benefits from that, and if you want to see Troy Hunt and the asp.net core guys Fowler and Edwards you would be better off over to NDC, but SD&D had all the core things right and a wider array of breakout sessions, and infrastructure like the food was a lot less chaotic at SD&D than it usually is at NDC. Compared to BuildStuff and Øredev – those conferences I only attended as visitor to the city, so I didn’t have to commute, meaning of course that’s nice.

If you like the environment you’ll be pleased that there was no shilling or abundance of obscure t-shirts handed out that would have drained natural resources, but I suspect that SD&D would have enjoyed more sponsorships and an expo floor which allegedly they have had before . Since this conference was actually SD&D2020 postponed several times over the course of two years, it is possible that SD&D 2023 will have pre-pandemic levels of shilling. All I know is I enjoy free t-shirts to clothe the child. Regardless I strongly recommend attending this conference.

Transformers

What are the genuinely difficult aspects of transforming your software function?

It seems everybody intuitively understands what brings speed and short time to market, and how that in turn automatically allows for better innovation. Also people seem to get that in the current stale market, with fast enough delivery you could even forego smarts and just brute force innovation launching new concepts and tweaks until profits go up and then declare a win as if you knew what you were doing that whole time. Secretly people also know that although you could rinse and repeat doing the naïve approach until retirement, optionally you could exert minimum effort and measure a bit better so that you know what you are doing so that you can focus your efforts.

So why aren’t everyone moving on this?

When you get a bunch of people in the same organisation you want to achieve some economies of scale and solve common problems once rather than once per team.

This means you delegate some functions into separate teams. Undoing this, or at least mitigating this, is difficult politically. Some people – with some cause – fear for their jobs when reorgs happen.

Sudden unexpected cost runaway is the biggest recurring nightmare of middle managers. Controls are therefore in place to prevent developer cloud spend to balloon.

Taken together however, this means teams are prevented from innovating independently as they cannot construct the virtual infrastructure as needed because of cost not being authorised, and they cannot play with new pieces of virtual infrastructure because they haven’t been approved by the central tech authority yet.

The Power of Sample Code

What is wrong with OOP?

In the culture wars between “Object Oriented Programming” and Functional Programming, you will find proponents of OOP that argue that we are doing fine – why should we change? and proponents of FP that lists a litany of inherent problems with what we are doing today and point to the ways FP solves them. After I once was at an Object Bootcamp with Fred George I believe the two main schools of thought are both wrong. All the problems listed by the FP peeps are correct, but they are not inherent in OOP, actually OOP addresses a few of them, but we are as an industry not doing OOP.

I may feel Fred George is the Messiah, but he is not alone in his views. Greg Young has similar concerns.

Inheritance is not the Big Deal

I am old enough to remember Borland C++ ads from the 90s. It focused a lot on inheritance, and reuse through inheritance became the USP for object oriented languages.

As soon as you have written some code though, you realise inheritance is the worst, as it creates undue coupling, making changes very hard to implement.

When Borland made those ads about how Porsche Turbo inherited the Carrera but implemented a big fat rear wing, they had begun their foray into C++ because it offered a way to handle the substantial boilerplate involved in writing a program in Windows. It was relatively straightforward to implement the basics and create a usable abstraction on top of the raw Windows API that made the developer experience much more pleasant.

As visual designers became a thing, they wanted a way to map properties with code, so that UI components (those things implemented as objects we mentioned above) could be manipulated by a developer in design mode. “Property” setters, basically syntactic sugar disguising normal functions, allowed the UI designers to read settings from the object, and replace them with what the developer types in. With this work, Borland and Microsoft were working to catch up with InterfaceBuilder from NeXT Computer (the same thing that lives on today in the Apple MacOS/iOS SDK) that had bolted a different type system on top of C and called it Objective C – but that had a world leading visual designer at the time. Anyway, I think they were in a hurry and didn’t think things through.

Approaches to deal with big programs

In a large codebase, the big problem is achieving low coupling but high cohesion. This means, you want all the code that belongs together to live together but you don’t want to have to make changes in seemingly unrelated code to modify a piece of functionality.

In large problems of old, you could call any subroutine from anywhere else, and many resources were shared, meaning between the time you set a value in a variable and you read from it, some piece of code in between could have modified the value, and you would not be automatically able to know where this access is made and how to prevent it.

In FP, we use modules for scoping, meaning you group functions into modules to aid readability, but the key concept, the Big Idea is immutability. After a value is created, it exists globally, but since they are read-only once created, the drawbacks of global state go away. There is no way to change something that somebody else relies on. You can transform it into a new thing that you need, but the original value hangs around until it’s no longer needed. It is harder to accidentally break other code with changes you are making

The Big Idea in Object oriented development is Encapsulation. You put the data with the code and manipulate abstractions. This means that if you get your abstractions right, you can change or replace these abstractions without needing to make sweeping changes in the codebase.

The original concept of object orientation relied on independent small sub programs that communicated by message passing, implicitly imagining like an “in tray” of messages that the object could process at its own pace and then send a response when the work was completed. However – objects were in C++, Java and C# was implemented as special dynamically allocated structs to which you made function calls, i e they became decidedly more synchronous than they were in Smalltalk or Simula. You would recognise Erlang Processes and Actors as looking more like OG objects. You also see that what made objects useful were that they shared properties we today associate with the term micro services, but on a smaller scale.

So what’s the problem, and what’s up with the title of this blog post?

Java, and C# arguably even more so, took the Big Idea and tossed it out the window. Property Get/Property Set to support novelties like graphical designers and visual components are a clear violation of encapsulation. Why are we letting objects access data that lives in other objects? The need to do that is a huge red flag that your model is incorrect. Both the bible, i.e. Refactoring, by Fowler and the actual Bible condemn this, this feature envy.

But why did these properties survive the nineties and live on into modern day? Why have they made things worse with auto properties?

Sample code

When you learn a new language, or to code in general, the main threshold is getting to the point where you write idiomatic code in that language. I e you use familiar phrases. You indent the code in a certain way, you name things according to a certain standard and you use familiar ways to do things like open a database connection, make a HTTP request et c, that a seasoned programmer would be familiar with. Unfortunately- in C# at least, these antipatterns are canon at this point, so to write properly encapsulated code would maybe cause a casual reviewer to ask WTF and be sceptical.

What is canon comes from the publicly available body of work that a beginner can reasonably access. Meaning, effectively Microsoft sets the bar when they announce features, document them and create samples.

There are some issues here. If you look at a large piece of sample code, you may notice how difficult it is to identify the key concept being demoed as the logging code or error handling bulk up the code in a way that is distracting, so brevity must be allowed to remain a priority, clearly.

At the edges where the code starts interacting with network and storage, this type of organisation isn’t inherently despicable either, so a blanket ban is perhaps not the way forward either.

How do we make it clear to new OO devs that when they fill that empty Models folder their project template creates for them with code they would be better off thinking OO proper?

By that I mean making classes that are extremely small, use value objects, prefer private fields, avoid properties et cetera. My suspicion is that any attempt at conveying this programming style through the medium of sample code in templates or documentation is doomed. The bulk of code necessary to not only prove the concept but to in fact make it part of the vernacular would require a large number of people making quite a lot of good code public do that new learners can assimilate the knowledge.

I think good OO code is scarce. Getting the abstractions right is just too hard, you will have compromises in various places, and all the tools tempt you with ways to stray from the narrow path of righteousness, but with modern refactoring tools you should be able to address some of the issues amd continually strive to make the code better.

Incidentally, with properly sized objects you can unit test without cheating (using internal helper methods, or by using mocks), so there is scope to brighten up the tests as well.

Automation and security

There is a recent spate of sophisticated attacks on software delivery mechanisms where cyber criminals have had massive success in breaching one organisation to get automatic access to hundreds of thousands of other organisations through the update mechanism the breaches organisation provides.

Must consider security at design time

I think it needs reiterating that security needs to be built in by default, from the beginning. I haven’t gone back to check properly, but I know I went back and deleted an old blog post because it had some dubious security practice in it. My new policy is, I would rather omit some part of a process than show a dodgy sample. There are so many blog posts you find if you search for “login form asp.net” that don’t even hash passwords. And rather than point beginners to the built-in password hashing algorithms that are available in .NET, and the two lines of code you have to write, they leave some beginners thinking it’s all right, just this once and breed this basic idea that security is optional. Something you test for afterwards if you are building something “important” and not something you think about all the time.

The thing is, we developers have tools that help us do complicated things – like break bits of code out from other bits of code automatically or rename specific constructs by a certain name, including surrounding text comments, without also incorrectly renaming unrelated constructs that share name.

It turns out cyber criminals too have plenty of automation that helps them spend very little effort breaking in to companies, and exploit this access in a number of different ways.

There is maybe no “why”

This has a couple of implications. First off, attackers are probably not looking for you per se. You may be a nobody, you will still be exposed to automated attacks that test your network for known vulnerabilities and apply automated suites of exploits to see what happens. This means that even if you don’t do anything that conceivably could have value to an attacker, you will still be probed.

The second thing is, to prevent data loss you need to make every step the attacker has to take a hardship. Don’t advertise what software versions your public facing servers are running, don’t let service accounts have access to things beyond what they need, do divide networks into segments so that – for example – one machine with ransomware cannot directly infect your entire network.

Defend in depth

Change any business processes that require people to open e-mail attachments as part of their job. Offer services that help people do their job in a more convenient way that is also more secure. You cannot berate people for attempting to do their job. I mean, you can but it is not helpful.

Move backups off site and offline of course, for many reasons. But, do remember that having to recover a massive storage system from a backup can still be an extinction level event for a business even if you do have a working reliable off site backup solution. If you lose a large SAN you may be offline for days, and people will not be able to work, you may need to bring sites offline while storage recovers. When you procure a sophisticated storage solution, do not forget to design a recovery strategy ahead of time for how to rebuild a massive spinning rust storage array from absolute zero while new data is continuously generated. It is a non-trivial design challenge that probably needs tailoring to how your business operates. The point is, avoiding the situation where you need to actually restore your entire storage from tapes is always best.

Next level

Despite the intro, I have so far only mentioned things that any company needs to think about. There are of course organisations that are actually targeted. Financial institutions, large e-retailers or software supply chain companies run a greater risk of being manually targeted by evildoers.

Updates

Designing a secure process for delivering software updates is not trivial, I am not in any position to give direct advice beyond suggesting that if you are intending to do that, to consider from the beginning how to track vulnerabilities but also how to effectively remove versions that have been flagged as actively harmful, and how to support your users if they have deployed something dodgy. If that day comes, you need to at least be able to help your users. It will still be awful, but if you treat your users right, you might still make it.

Humans

Your people will be exploited. Every company that has an army of customer service representatives will need to make a trade-off between customer convenience and security. Attacks on customer service reps are very common. If you have high-value clients, people will use you to get to your clients’ money. There is nothing to say here, other than obviously you will be working with relevant authorities and regulatory bodies, as well as fine tune your authentication process so that you ask for confirmation information that is not readily available to an attacker.

Insiders

I don’t have any numbers on this, so I am unsure how big of a problem this is, but it is mentioned often in security. Basically, humans can be exploited in a different way. Employees can be coerced through intimidation, blackmail or bribery to act maliciously on behalf of an attacker. My suspicion is that this is less common than employers think, and that times when an employee was stressed or distracted and fell for a phishing e-mail, the employer would think “that is too obvious of a phish, this guy must have been in on it”.

It makes me think of that one time when a systemic failure on multiple levels meant that a cleaner accidentally started a commuter train that ran from the depot the length of the commuter railway Saltsjöbanan – at maximum speed – eventually crashing through the buffers and into a building at the terminus. In addition to her injuries, she suffered the headlines “train stolen and crashed” until the investigation revealed the shocking institutional failings that had made this accident possible. I can’t remember all of them but there were things from the practices in how cleaners accessed the trains, how safety controls were disabled as a matter of course, how trains were stabled, the fact that points were left set so that a runaway train would actually leave the depot. A shambles. Yet the first reaction from the employer was to blame the cleaner.

Anyway, to return to the matter at hand – yes, although I cannot speculate on the prevalence it is a risk. Presumably, if you hire right and look after your people you can get them to come to you if they have messed up and gotten themselves into a compromised situations where they are being blackmailed or if somebody is leaning on them. Breeding a strong culture of fear can be counterproductive here – i.e. let people think that you will help them rather than fire them and litigate as long as they voluntarily come forward. If you are working in a regulated industry, things are complicated further by law enforcement in various jurisdictions.

Passwords – for wannabe techies

EDIT: As you will notice a lot of my links point to the works of security researcher Troy Hunt, and I should point out that I have no affiliation with him. I just happen to find articles that make sense and they happen to be written by him or refer to his work. Anyway here is another one, addressing the spirit of this blog post, with horrifying examples.

If you google for “ASP.NET login form” or similar, you will get some hits with really atrocious examples of how to NOT handle peoples’ credentials. Even if you are a beginner writing an app that nobody will use, you should never do user login in a shitty way. You will end up embarrassing yourself and crucially your users, who most likely are friends and family when you are at the beginner stage, when – inevitably – your database gets stolen.

How do you do it properly then, you ask?

Ideally – don’t. Let other people worry about getting hacked. Plenty of large corporations are willing to take the risk.

Either way though,  get a cert and start running HTTPS only.  HTTP is fast and nice for sites that allow anonymous access, such as blogs, but as soon as you are accepting sensitive data such as passwords you need to go with HTTPS.

If you are in the Microsoft sphere, just create a new web project in Visual Studio, register yourself as a developer with Google, Facebook, Microsoft or similar, create apps there, and configure those app credentials in the Visual Studio app – and do make sure you store those app secrets outside of source control – and run that app. After minimal tweaking you should have something working where you can authenticate in your app using those authentication providers. After that –  you can then, with varying effort required, back-port this authentication to whichever website you were trying to add authentication to.

But I insist on risking spreading my users’ PII when I get hacked

OK, fair enough, on your head be it.

Separate your auth storage from your app data storage. At least by database connection user rights. When they hack your website using SQL-injection, they shouldn’t be able to get hashed passwords. That just isn’t acceptable. So, yes, don’t do Windows Authentication on your dev box, define SQL Database Users and SQL Server Logins and make scripts that ensure they are created if they don’t exist.  Again, remember to keeps secrets away from the stuff you commit to version control. Use alternative means to store secrets for production. Azure will help you with these settings in the admin interface, for instance.

Tighten the storage a bit

Store hashed password and the salt. Set your storage permissions such that the password hash cannot be retrieved from the database at all. The database login used to access the storage should not have any SELECT rights, only EXECUTE on stored procedures. That way you write one stored procedure that retrieves the salt for a login so that the application can calculate a hash for the password supplied by the user in trying to log in,  then another stored procedure that takes a username and the hash and compares it internally to the one stored in the database, returning back to the caller whether or not the attempt was successful, without ever exposing the stored hash.

Note that if you tighten SELECT rights  but don’t separate storage between auth and app, every single Entity Framework sample will crash and burn as EF requires special administration to use stored procedures rather than direct CRUD.

Go for a stupidly complex hashing algorithm

Also – you are not going for speed when you are looking for password hashes. MD5 and SHA1 are really nice for checksumming files, but they suck for passwords. You are looking for very slow and complex algorithms.

.NET come with Microsoft.AspNet.Cryptography.KeyDerivation.Pbkdbf2 , but the best and most popular password hashing algorithm is bcrypt.

The hashes generated are of fixed size, so just define your storage to be big enough to take the output of the algorithm and then there is no need to limit the size of the password beyond the upload connection limits defined by the web server due to DoS protection, but for a password, those limits are ludicrously high, so you don’t have to even mention them to the user. Also, don’t mess with copy/paste in the password box. You want to be password manager friendly.

Try and hack yourself

Use Zed Attack Proxy or similar to try and break into your site. It will tell you some of the things you need to change in terms of protecting against XSS and CSRF. The problem isn’t that your little site is going to be the target of the NSA or the Chinese government, what you need to worry about is the tons of automated scripts that prod and poke into every site everywhere and collect vulnerabilities with zero effort from the point of the attacker. If you can avoid being vulnerable to those most basic attacks, you can at least have some self-respect.

In conclusion

These are just the basics, and I am mostly writing this to discourage people from writing login forms as some kind of beginner exercise. Password reuse is rampant still, so if you make a dodgy login form you are most likely going to collect some userid / password combinations people really use at other sites, and those sites may very well be way more important than yours. Not treating that information seriously is extremely unprofessional and bad karma. Please do not build user authentication yourself if you don’t intend to make some kind of serious effort in protecting people’s data. Instead use OAuth solutions and let people authenticate using Google, Facebook, Twitter, Microsoft or whichever auth provider you prefer. That way you will never see any passwords and won’t have anything that can be stolen, which is a much easier life to lead