A few months ago, I wrote about a side project I intended to work on. The idea was to extend the community version of Traefik by implementing an ACME cert store backed by Azure Table Storage. I made some pretty good progress on the feature, but have finally decided not to continue.
The intermediate progess can be found here. If anyone wishes to continue this work, please have at it!
More details below.
Well, there are so many projects that I'd like to work. For a while, this was at the top of my list, so I worked on it.
However, a few weeks back, my Kubernetes cluster had a major outage. Traefik stopped working all of a sudden and all my sites were unreachable. I tried for several hours to get it working, but just couldn't figure it out. Traefik was working fine using the Let's Encrypt staging endpoint, but it would then fail when switching to production.
I finally got tired of messing with Traefik and decided to look again at alternatives. I found a good nginx+CertManager tutorial, and this time (I had tried before), the setup went smoothly and I got all my sites up and running without Traefik. Once I was completely off of Traefik, I lost the motivation to continue work on it.
I got pretty far. In terms of what's currently in the branch, here's what got done:
I was able to build and run this test version of Traefik in my cluster. The code is able to create certificates, save them in Azure storage, and reload them when the service restarts. The code does throw some errors sometimes that I haven't quite figured, but that just needs a little more testing and debugging.
The biggest feature that is not implemented yet is the support for concurrency. Azure Table storage has a great story for concurrency. ETags prevent multiple writers from overriding each others' changes, and that model should work perfectly for multiple instances of Traefik.
When a certificate needs to be created or renewed, Traefik's behavior needs to be changed such that:
There is also likely some work needed to let all instances know when a new certificate is created. This is necessary so that all instances of Traefik refresh their stores and use the latest certificates. However, this could be as simple as refreshing certs from Table Storage periodically. I don't think anything more complex is needed.
The final piece of the puzzle is testing. I did not write any unit tests and I don't know how much work is involved there. The code will of course also need some good old-fashioned exercise in different test environments to weed out all the bugs.
I believe the code is quite clean and readable. I tried to follow good coding standards and all that. The code is also light-weight in that it doesn't pull in any new dependencies.
So overall, if you want to work on this feature, I think my work could be a really good base to start from. I would also be willing to offer assistance or pair up with someone to finish this project. I'm just not willing to continue to work on it by myself at the moment.
Although this didn't result in delivering a finished product, I have no regrets over the many hours I spent working it. It was a fantastic opportunity to build some Go Language skills, and there are always useful lessons to learn when working in other people's code.
That said, I also think there's value in realizing when it's time to stop working on something. While the idea of perservering to the very end is highly romantacised, I don't think it's always the best thing to do. I would have loved to complete the project, but there are several side projects in my mind (or partially started) that I would like to work on. Besides the remaining feature development and testing, I'm sure there would be a lot of work to get this code approved and rolled into the official release of Traefik.
So all-in-all, I would say I got 80% of the benefit, and put 20% of the work, needed to complete this project.