Category Archives: CI

.NET C# CI/CD in Docker

Works on my machine-as-a-service

When building software in the modern workplace, you want to automatically test and statically analyse your code before pushing code to production. This means that rather than tens of test environments and an army of manual testers you have a bunch of automation that runs as close to when the code is written. Tests are run, the rate of how much code is not covered by automated tests is calculated, test results are published to the build server user interface (so that in the event that -heaven forbid – tests are broken, the developer gets as much detail as possible to resolve the problem) and static analysis of the built piece of software is performed to make sure no known problematic code has been introduced by ourselves, and also verifying that dependencies included are free from known vulnerabilities.

The classic dockerfile added by C# when an ASP.NET Core Web Api project is started features a multi stage build layout where an initial layer includes the full C# SDK, and this is where the code is built and published. The next layer is based on the lightweight .NET Core runtime, and the output directory from the build layer is copied here and the entrypoint is configured so that the website starts when you run the finished docker image.

Even tried multi

Multistage builds were a huge deal when they were introduced. You get one docker image that only contains the things you need, any source code is safely binned off in other layers that – sure – are cached, but don’t exist outside this local docker host on the build agent. If you then push the finished image to a repository, none of the source will come along. In the before times you had to solve this with multiple Dockerfiles, which is quite undesirable. You want to have high cohesion but low coupling, and fiddling with multiple Dockerfiles when doing things like upgrading versions does not give you a premium experience and invites errors to an unnecessesary degree.

Where is the evidence?

Now, when you go to Azure DevOps, GitHub Actions or CircleCI to find what went wrong with your build, the test results are available because the test runner has produced and provided output that can be understood by that particular test runner. If your test runner is not forthcoming with the information, all you will know is “computer says no” and you will have to trawl through console data – if that – and that is not the way to improve your day.

So – what – what do we need? Well we need the formatted test output. Luckily dotnet test will give it to us if we ask it nicely.

The only problem is that those files will stay on the image that we are binning – you know multistage builds and all that – since we don’t want these files to show up in the finished supposedly slim article.

Old world Docker

When a docker image is built, every relevant change will create a new layer, and eventually a final image will be created and published that is an amalgamation of all consistuent layers. In the olden days, the legacy builder would cache all of the intermediate layers and publish a hash in the output so that you could refer back to intermediate layers should you so choose.

This seems like the perfect way of forensically finding the test result files we need. Let’s add a LABEL so that we can find the correct layer after the fact, copy the test data output and push it to the build server.

FROM mcr.microsoft.com/dotnet/aspnet:7.0-bullseye-slim AS base
WORKDIR /app
FROM mcr.microsoft.com/dotnet/sdk:7.0-bullseye-slim AS build
WORKDIR /
COPY ["src/webapp/webapp.csproj", "/src/webapp/"]
COPY ["src/classlib/classlib.csproj", "/src/classlib/"]
COPY ["test/classlib.tests/classlib.tests.csproj", "/test/classlib.tests/"]
# restore for all projects
RUN dotnet restore src/webapp/webapp.csproj
RUN dotnet restore src/classlib/classlib.csproj
RUN dotnet restore test/classlib.tests/classlib.tests.csproj
COPY . .
# test
# install the report generator tool
RUN dotnet tool install dotnet-reportgenerator-globaltool --version 5.1.20 --tool-path /tools
RUN dotnet test --results-directory /testresults --logger "trx;LogFileName=test_results.xml" /p:CollectCoverage=true /p:CoverletOutputFormat=cobertura /p:CoverletOutput=/testresults/coverage/ /test/classlib.tests/classlib.tests.csproj
LABEL test=true
# generate html reports using report generator tool
RUN /tools/reportgenerator "-reports:/testresults/coverage/coverage.cobertura.xml" "-targetdir:/testresults/coverage/reports" "-reporttypes:HTMLInline;HTMLChart"
RUN ls -la /testresults/coverage/reports
 
ARG BUILD_TYPE="Release" 
RUN dotnet publish src/webapp/webapp.csproj -c $BUILD_TYPE -o /app/publish
# Package the published code as a zip file, perhaps? Push it to a SAST?
# Bottom line is, anything you want to extract forensically from this build
# process is done in the build layer.
FROM base AS final
WORKDIR /app
COPY --from=build /app/publish .
ENTRYPOINT ["dotnet", "webapp.dll"]

The way you would leverage this test output is by fishing out the remporary layer from the cache and assign it to a new image from which you can do plain file operations.

# docker images --filter "label=test=true"
REPOSITORY   TAG       IMAGE ID       CREATED          SIZE
<none>       <none>    0d90f1a9ad32   40 minutes ago   3.16GB
# export id=$(docker images --filter "label=test=true" -q | head -1)
# docker create --name testcontainer $id
# docker cp testcontainer:/testresults ./testresults
# docker rm testcontainer

All our problems are solved. Wrap this in a script and you’re done. I did, I mean they did, I stole this from another blog.

Unfortunately keeping an endless archive of temporary, orphaned layers became a performance and storage bottleneck for docker, so – sadly – the Modern Era began with some optimisations that rendered this method impossible.

The Modern Era of BuildKit

Since intermediate layers are mostly useless, just letting them fall by the wayside and focus on actual output was much more efficient according to the forces that be. The use of multistage Dockerfiles to additionally produce test data output was not recommended or recognised as a valid use case.

So what to do? Well – there is a new command called docker bake that lets you do docker build on multiple docker images, or – most importantly – built targetting multiple targets on the same Dockerfile.

This means you can run one build all the way through to produce the final lightweight image and also have a second run that saves the intermediary image full of test results. Obviously the docker cache will make sure nothing is actually run twice, the second run is just about picking out the layer from the cache and making it accessible.

The Correct way of using bake is to format a bake file in HCL format:

group "default" {
  targets = [ "webapp", "webapp-test" ]
}
target "webapp" {
  output = [ "type=docker" ]
  dockerfile = "src/webapp/Dockerfile"
}
target "webapp-test" {
  output = [ "type=image" ]
  dockerfile = "src/webapp/Dockerfile"
  target = "build"
} 

If you run this command line with docker buildx bake -f docker-bake.hcl, you will be able to fish out the historic intermediary layer using the method described above.

Conclusion

So – using this mechanism you get a minimal number of dockerfiles, you get all the build guffins happening inside docker, giving you freedom from whatever limitations plague your build agent yet the bloated mess that is the build process will be automagically discarded and forgotten as you march on into your bright future with a lightweight finished image.

Sprint 0 for old school .NET devs

When you start work on a code base, either from scratch or as you approach it as a new dev, there are a few things you should ensure are in place before you get going. Like most other things, these are easier and more natural on more mature platforms than it tends to be on .NET where the toy app tends to be king, but it is doable. I will here list some technologies I use and I may mention some competing ones for completeness. I can’t share most of the code because reasons, but if there is interest, I can put together samples showing specific techniques of the ones I have used.

Introduction

The checklist of what you need to put in place if it doesn’t exist is the following:

  • Version control
  • Build server
  • Automated tests run as part of the build process
  • Automated packaging
  • Automated deployment
  • Automated monitoring
  • Automated setup of development environment

I know most of that reads as “a house must have a roof” but there were, and still are,  places where people go “We’re only a couple of guys, we can just leave the source on the NAS, it’s backed up”, so I’m just being clear here.

The last point may seem like overkill, but especially if you use the abomination that is 3rd party components, it is crucial to save time and more easily onboard new guys.

Version control

I’m going to shock you here and not require that you use Git. You might want to, because it’s becoming a skill everybody has, and GitHub is a beautiful place to keep your code, but if your workflow is centralised anyway. you can legitimately stay with Subversion as long as you take care of the basics, as in backing up the repository properly. Recent versions have less horrible diffing, so it is is not that bad anymore. The most crucial aspect is that you do need a version control system that has a good powerful command line interface.

Build server

I’m going to suggest TeamCity here, because I know it well, but it has some real drawbacks. GitHub comes with TravisCI which is a modern CI system. Jenkins has been popular. The point is – make sure your build configuration is the same on the development machine as it is on the build server, and make sure the build configuration is version controlled with the source, ideally that you can run all of it from the command line. This way you can know you are not about to break the build before you push your changes and also you know that you can recreate a previous version of the code without faff because a contemporary build configuration can be found in source control along side the source.

Automated tests run as part of the build process

The obvious bit here are to run your favourite command-line testrunner on your build output to make sure all your tests run on the build server.

You can always write powershell scripts to make simple but high-value integration tests, it doesn’t all have to be Selenium / FitNesse even though those are quite nice once you are over the initial hurdle, but yes, there is a cost.

Automated deployment

Many ways to deploy exist, OctopusDeploy is popular to use with TeamCity, but you should look at chef and puppet as well. They tend to be hostile to Windows, but it is now possible to use them, and they make sense. It is as if they are specifically designed for the purpose of deploying and maintaining software infrastructure.

Years ago I looked very briefly at Puppet, but I have come to work with chef recently and it does what it is supposed to do, but I have no factual reason to rate chef higher than puppet other than that I now know it.

For chef you can look at kitchen to test your deployment scripts in transient VMs. It can use Azure VMs, VMWare WorkStation or Vagrant / VirtualBox and it does shorten the feedback cycle considerably.

Automated monitoring

You can use pingdom, Monitis or Nagios or just run a few curl/wget in a scheduled task and send an email if they don’t return the expected information. Either way, you need to be able to know if things aren’t working. Use the smallest possible thing you can get away with if your budget is constrained, but do use something.

Automated setup of the development environment

This may be sensitive, as developers tend to be particular about where their code lives and how their machine is organised, but having the developers just be able to run a script and end up with all their development tools set up and ready to debug.

Some things are going to be difficult. Installing Redgate SQL Source Control doesn’t seem to be scriptable, but other than that you can:

  • install Visual Studio
  • Install plugins
  • Use chocolatey to install source control management
  • Get the sources locally

In some cases, as your system grows, and you break bits out into separate micro services, you will need this scripting to make sure you can set up the entire system to debugged locally. Ideally you would, as an additional means to verify your deployment method, use chef-zero or corresponding technology from Puppet to use your normal deployment templates as you configure the system on the development machines as well. This is another situation where clever IDE tooling actively makes things difficult for you, but any work you put in to automate here will pay huge dividends.

But what does this all mean in practice?

In short, if you are going to keep working on Windows exclusively, at least learn Powershell. It is not that horrible. The stance among Windows users have long been anti-scripting, and with the state of CMD.exe it has been for good reason. Powershell, though, has a lot of features that make sense even though the syntax can be confusing at first. Learning ruby may be more viable from a cross-platform perspective, but Powershell is evidently more suited for Windows and a lot of useful cmdlets for enabling windows features et cetera are only available in Powershell.

Salvation through a bad resource

Me and a colleague have been struggling for some time deploying an IIS website correctly using chef. As you may be aware, chef is used to describe the desired state of the configuration of a system through the use of resources, which know how to bring parts of the system into the desired state – if they, for some reason, should not be – during a process called convergence.

Chef has a daemon (or Service, as it is called in the civilised world) that continually ensures that the system is configured in accordance with the desired state. If the desired state changes, the system is brought into line automagically.

As usual, what works nicely and neatly in Unix-like operating systems requires volumes of eloquent code literature, or pulp fiction rather, to implement on Windows, because things are different here.

IIS websites are configured with a file called web.config. When this file changes, the website restarts (the application threadpool does, to be specific). Since the Chef windows service is running chef-client in regular intervals it is imperative that chef-client doesn’t falsely assume that the configuration needs to be overwritten  every time it runs is as that would be quite disruptive to any would-be users of the application. Now, the autostart behaviour can be disabled, but that is not the way things should have to be.

A common approach on Windows is to disable the chef service and to just run the chef client manually when you know you want to deploy things, but that just isn’t right either and it takes a lot away from the basic features and the profound magic of chef. Anyway, this means we can’t keep tricking chef into believing that things have changed when they really haven’t, because that is disruptive and bad.

So like I mentioned earlier – IIS websites are configured with a file called web.config. Since everybody that ever encountered an IIS website is aware of that, there is no chance that an evildoer won’t know to look for the connection strings to the database in that very file. To mitigate this well-knownness, or at least make the evil-doer first leverage a privilege escalation vulnerability,  there is a built-in feature that allows an administrator to encrypt the file so that the lowly peon account that the website is executing as doesn’t have the right to read it.  For obvious reasons this encryption is tied to the local machine, so you can’t just copy the file to a different machine where you happen to be admin to decrypt it. This does however mean that you have to first template the file to a temporary location and then check if the output of the chef template, the latest and greatest of website configuration, is actually any different from what was there before.

It took us ages to figure out that what we need to do is to write our web.config template exactly as it looks once it has been decrypted by Windows, then start our proceedings by decrypting the production web.config into a temporary location. We then set up the chef template resource to try and overwrite the temporary file with new values, and if there has been a change, use notifications to trigger a ruby_block that normally doesn’t execute, but when triggered by the template resource both encrypts the updated config and copies it across to prod.

Result!

But wait… The temporary file has to be deleted. It has highly sensitive information (I would like to flatter myself) and shouldn’t even have made it to disk in its clear-text form, and now it’s still there waiting to be read by the evildoer.

Using a ruby block resource or a file resource to delete the temporary file causes chef to record this as a change, and change that isn’t a change is bad. Or at least misleading in this case.

Enter colleague nr 2 “just make a bad resource that doesn’t use converge_by”.

Of course! We write a resource that takes a path and deletes it using pure ruby code, but it “forgets” to tell chef that a file was deleted, so chef will update the configuration when it should but will gladly report 0 resources updated at the end of a run where nothing has changed. Beautiful!

DONE. Week-end. I’m off.

 

Tech stuff for non-technical people

This is not a deeply technical post. Even less technical than my other ones. I write this inspired by a conversation on Twitter where I got asked about a few fairly advanced pieces of technical software by a clever but not-that-deeply technical person. It turned out he needed access to a website to do his human messaging thing and what he was given was some technical mumbo jumbo about git and gerrit. Questions arose. Of course. I tried to answer some of the questions on Twitter but the 140 char limit was only one of the impediments I faced in trying to provide good advice. So, I’m writing this so I can link to it in the future if I need to answer similar questions.

I myself am an ageing programmer, so I know a bunch of stuff. Some that is true, some that was true and some that ought to be true. I am trying to not lie but there may be inaccuracies because I am lazy and because reality changes while blog posts remain the same because, again – lazy.

What is Git?

Short answer: A distributed version control system.

Longer answer: When you write code you fairly quickly get fed up with keeping track of files with code in them. The whole cool_code.latest24.txt naming scheme to keep old stuff around while you make changes quickly goes out the window and you realise there should be a better way. There is. Using a version control system, another piece of software keeps track of all the changes you have made to the bunch of files that make up your website/app/program/set of programs, and let you compare versions, find the change that introduced a nasty bug et c. With Git you have all that information inside your local computer and can work very quickly and if you need to store your work somewhere else like on a server or just with a friend that has a copy of the same bunch of files (repository), only the changes you have made get sent across to that remote repository.

There are competing systems to git that do almost the same thing (mercurial is also distributed) and then a bunch of systems that instead are centralised and only let you see one version at the time, and all the comparison/search/etc stuff happen across the network. Subversion (SVN) or CVS are common ones. You also have TFS, Perforce and a few others.

What is a build server?

Short answer: a piece of software that watches over a repository and tries to build the source code and run automated tests when source code is modified.

Longer answer:

The most popular way for a developer to dismiss a user looking for help is to say “That works on my machine”. Legitimately, it is not always evident that you have forgotten to add a code file to your version control system. This causes problems for other people trying to use your changes as the software is incomplete and probably not working as a result of the missing files. Having an automated way of checking that the repository is always in a state where you can get the latest version of the code in development, build it and expect it to work is very helpful. It may seem obvious to you, but it was not a given until fairly recently. Any decent group of developers will make sure that the build server at least has a way of deploying the product, but ideally deployment should happen automatically. This does require that the developers are diligent with their automatic tests and that the deployment automation is well designed so that rare problems can be quickly rolled back.

What is Gerrit?

Short answer: I’ll let Wikipedia answer this one as I have managed to avoid using it myself despite a few close calls. It seems to be a code review and issue tracking tool that can be used to manage and approve changes to software and integrate with the build server.

Long answer:

Before accepting changes to source code, maintainers of large systems will look at submitted changes first and see if it is correct and in keeping with the programming style that the rest of the system uses. Despite code being written “for the computer”, maybe the most important aspect of source code quality and style is that other contributors understand the code and can find their way. To keep track of feature suggestions, bug reports and the submitted bits of code that pertain to them, software systems exist that can help out here too.

I don’t know, but I would think I could get most of that functionality with a paid GitHub subscription. Yes, gerrit is free, but unless you are technical and want to pet a server you will have to hire somebody to set up and maintain the (virtual) machine where your are running gerrit.

Where do I put my website?

Short answer: I don’t know, the possibilities are endless today. Don’t pay a lot of money.

Longer answer: Amazon decided, a long time ago, to sell a lot of books. They figured that their IT infrastructure had to be easy to extend as they were counting on having to buy a ton of servers every month so their IT stuff had to just work and automatically just use the new computers without a bunch of installing, configuration and the like. They also didn’t like it that the most heavy duty servers cost an arm and a leg, and wanted to combine a stack of cheaper computers and pool those resources to get the same horsepower at a lower cost. They built a bunch of very clever programs that provided a way to create virtual servers on demand that in actual fact were running on these cheap “commodity hardware” machines and they created a way to buy disk space as if it was electricity, you paid per stored megabyte rather than having to buy and entire hard drive et c. This turned out to be very popular and they started selling their surplus capacity to end users, and all of a sudden you could rent computing capacity rather than making huge capital investments up front. The cloud era was here.

Sadly, there was a bunch of companies out there that already owned a bunch of servers, the big iron expensive ones even, and sold that computing capacity to end users, your neighbourhood ISP, Internet Service Provider, or Web Hotel.

Predictably, the ISPs aren’t doing too well. They niche themselves in providing “control panels” that make server administration slightly less complicated, but the actual web hosting aspect is near free with cloud providers, so for almost all your website needs, look no further than Amazon or Microsoft Azure, to name just two. Many of these providers can hook into your source control system and request a fresh copy of your website as soon as your build server says the tests have run and everything is OK. Kids today take that stuff for granted, but I still think it’s magical.

Don’t forget this important caveat: Your friendly neighbourhood ISP may occasionally answer the phone. If you want to talk to a human, that is a lot easier than to try and get a MS or Amazon person on the phone that can actually help you.

Azure Websites with automatic deployment

Automatic deployment when you commit and push changes to your Github/Bitbucket repo is truly awesome and like running water or electricity that it is so convenient that you incorporate it into the fabric of your life and eventually only really notice it when it’s broken.The amount of reduced headache and increased productivity is amazing.

Overview

However, bringing continuous deployment to legacy web projects and solutions can be a bit challenging until it all works. The moving parts involved favor convention over configuration so in the most cases you don’t need to really do anything, but then there are cases when you all of a sudden notice that you need to do something to make things roll smoothly again.

Windows Azure Websites use a few tricks for the Deploy from Source Repository-feature to work. They put a git repository in your allotted folder on the IIS where your website resides which you can easily see via FTP.  You can access this repo directly if you do a straight git deployment where an “azure” remote is added to your local git repo, otherwise it is used by WAWS to pull your changes from TFS, Codeplex, GitHub or Bitbucket and it is from there the deployment happens. Allegedly, even when you deploy from Dropbox, your changes are first applied to the git repo in your directory in Azure and then the deployment begins from there.

The deployment system is open sourced at https://github.com/projectkudu/and you find further documentation there.

Gotchas

Multiple deployable projects in one repo

Let’s say you have a solution to deploy that has multiple projects that are potentially deployable. Kudu will not be able to guess which one to deploy and the deployment will fail with an error. What to do?

The solution is a deployment configuration file, a .deployments file. Yes, Windows won’t let you right click and create a file called “.deployment”, nor rename an existing file to that name, so you need to save it directly to that name from Visual Studio or just type the following in cmd.exe: (yes, the command means to copy from the Console, as in what you type, so just type the config file and finish with Ctrl+Z which means End of File and press enter again).

C:\Users\H4xx0rD00d\teh_c0dez\awesomerepo>copy con: .deployment
[config]
project = theActualFolderWhereTheProjectThatShouldBeDeployedResides
^Z
        1 file(s) copied.
C:\Users\H4xx0rD00d\teh_c0dez\awesomerepo>

After you add, commit and push that file, Azure will know which project to deploy. Further documentation on the .deployment file is available here.

Compilation fails with missing references

If you are using Nuget to manage your dependencies, you can choose either of two routes, Nuget Package Restore where Kudu will download the relevant libraries for you, or just committing your entire .nuget folder to your Git repository so that all your reference libraries are available for compilation on Azure.

The first option makes your repository more light-weight, but on the other hand you would be hoping that whomever manages the nuget packages on which you depend aren’t cheeky, like Microsoft have been in the past and withdraw the packages from under you, or that the maintainer misses a breaking change thus violating SemVer versioning causing Kudu to supply your app with an incompatible library.

With the second option you get a bulkier Repo, but you are in control of new versions of dependent libraries so that you can find out that they truly work with your system before you deploy. It works almost the same if you manually maintain your dependencies in the form of a Lib folder containing the DLLs that you need. Just ensure your Libs folder has been comitted to your Git repo and Kudu will be able to find them.