I’m going to rehash some learnings that I have made over the last decade and a bit of doing cloud in one way or another. I have recently read thought leaders report similar things – only better written and backed by more experience of course – which made me think “oh, I’m not totally crazy then” and to set about writing this down.
Basically, it is my medium tempered take on the whole cloud thing, in terms of getting on it.
Why cloud is cool
In the bad old days, if you had an idea for some software, you had to start by buying a server. I mean it was never on the scale of “I have an idea for a consumer product, let’s build a factory”, but still it was definitely a barrier to getting started.
The impetus for building out cloud was that Amazon needed compute for their little bookshop website (remember!?) and thought it was prohibitively expensive to buy high-end servers, so they decided to buy a metric faecal ton of low-end computers instead, and use software to provision this aggregated computing horsepower, and basically letting people choose between a bunch of virtual machine sized depending on the oompgh a certain department needed for their application.
This was of course extremely complicated, the software bit, but once they were done, they had accidentally created cloud compute and could make more money selling cloud compute than selling books. The ability of using relatively simple APIs to create and provision VMs allowed startups to acquire really pathetic servers for nearly nothing, which was amazing for trialling ideas and fed the software boom we have seen over the last decades. Other services were built on top, and competitors came around.
Why cloud is cool is the tight APIs that lets you create and destroy infrastructure in an automated fashion. Yes, you can autoscale, but also the rapid prototyping potential is really beneficial and arguably even more significant.
What to worry about
Security
Yes, of course. Public cloud, you can tell from the name. It is not the same as having your servers at home, at least psychologically. On the other hand, assuming you are secure because your servers are inside your own building is false as well, you are still on the internet. There is no getting around this, you already have network people and security people, and with all cloud providers there are ways of securing your network that they will be familiar with how to operate sensibly. I e your network and security people will know what to do.
Cost
Yes, it’s a big one. Not all cloud things are free or near free. Basically, running a VM 24/7 and block storage (as in a scalable pretend hard drive mapped to a VM) are usually the most expensive things you can do in a public cloud. Sadly, a VM tucked away in a virtual private network with elastic storage mapped to it seem like such an easy migration path if you are currently running your apps in VM You will need to migrate your apps over to cloudier solutions such as AWS Fargate/Lambda or Azure App Services to reduce cost eventually. For your in-house LOB apps you can in most cases (but not all) trivially replace file system storage with cloud native blob storage such as AWS S3 or Azure BlobStorage for your files storage needs, but it does require code changes, even if they are small. As the cloud bill start to come in, it seems a good way to spend developer resources as the returns in terms of cost savings can be quite significant. Be wary of giving developers the ability to create resources at will, as the odd developer accidentally leaving a VM running will quickly accumulate. There are ways of dealing with this kind of stuff, but do consider it.
What not to worry about
Multi cloud
There are many tools that provide abstractions over cloud APIs, and many tools that promise that they can offer you independence and warn of vendor lock-in. That is for most people just a waste.
You will need to choose one provider for your app stack. You can still have Google Apps for email and use AWS for cloud, or use DNSimple for DNS, Office 365 for email and AWS for apps, those are mostly orthogonal concerns. You will suffer outages. You will not – without incurring unfathomable cost – be able to load balance across cloud providers. If you really are that uptime sensitive, it would be cheaper for you to have georedundant datacentres and give the cloud thing a miss.
The problem with attempting to stay cloud agnostic is that you can only use the lowest common denominator of the tools you have available rather than throwing yourself in feet first into all the opportunities that exist with a given cloud provider.
Worst case, if the CEO gets angry enough at something and wants to switch just to make a point, it still will not be completely impossible to rewrite code in the seams. For instance, if you change your code to use AWS S3, it would be relatively trivial to change the code that calls S3 to use Azure BlobStorage in a pinch. No need to go choose a platform for it. Just like with ORMs and database providers (“with NHibernate it’s so easy, you can switch to Oracle much more easily”) people very rarely switch cloud providers. There would have to be a very compelling economic argument anyway.
Why go through with it?
Rapid prototyping
You should test in production anyway, but if you insist on creating test environments, being able to copy/paste your prod environment exactly and test your changes is only possible in an environment where you aren’t poking at real metal. It would be ludicrous to buy overbuilt on-prem hardware “because sometimes I like to spawn up a few extra copies of prod”. The powers that be would be livid at the massive capital expense that would go underutilised most of the time. With cloud however you spawn, test and destroy in minutes. Merely a blip on the radar in terms of cost. To mitigate the risk of developers leaving stray instances around you can just use governance like you do anywhere else in a workplace, but ideally the concept of ephemeral instances should lend itself to clean up nicely.
Modern software development
Bringing the organisation to a place where it has autonomous engineering teams that can bring feature from idea to production without hand-offs is the key driver for organisational performance. Moving to the cloud is going to make that happen. You could achieve this with on prem as well, but it would probably mean buying more hardware than you will really use more than in short bursts. If that trade off is worthwhile to you, then who am I to deny you your wish, but for most people cloud is the way to go.
What to do?
You probably need to get some help with this. Everybody in your organisation is already doing things. Taking on a cloud migration is going to be a massive effort for everybody, and you are still probably going to need an external experienced consultancy to help you. There are many out there that offer to architect a cloud migration for you. Not everyone is a charlatan, but, given that the selection process for these types of gigs is “who did the CxO meet at a conference/play golf with/…” I think the most important take-away is that the individual firm probably doesn’t really matter. It also probably doesn’t really matter which cloud provider you go with. Let a bunch of ops and devs benchmark the tools and APIs of various providers and come back with their feedback. There are probably going to be some budgets that can be negotiated between your consultancy (who most likely also has a VAR agreement with some cloud providers) , the provider and yourself that will determine some kind of benefits for one over another, but that’s still only speculation at this point.
Eventually one provider will be declared the winner and work will start. It doesn’t matter, really, which one was chosen. Even If the engineers say there really is a show-stopper, do investigate, but most problems can probably be avoided through some development. If you are running some weird VM somewhere that needs specific hypervisor features or some curious networking then of course you will have a challenge. Not necessarily an impossible one, though. This is not going to be done in an afternoon anyway. There is time to make changes to code, and there will be a need to do so in order to fully leverage the public cloud as mentioned above. Obviously non-blockers can be deferred, but unlike traditional tech debt there will be direct cost implications of course.