Cert-Manager is the standard way to manage Let's Encrypt certificates in Kubernetes. A few months ago, Cert-Manager stopped working for me due to a DNS bug in Kubernetes. Although it wasn't the fault of Cert-Manager, no solution was provided for several months and I had to learn how to manually renew my certificates to keep my sites up.
There were several things I found annoying about CertManager even before this bug presented itself:
And since I had just learned how to manually create certificates in Kubernetes, I decided to build my own Let's Encrypt certificate manager!
Note: I've been using KCert in my own cluster for a few months, but the code is definitely not production ready. Feel free to try it out, but expect bugs, outdated docs and breaking changes until I officially release it.
There were several goals/requirements I wanted to achieve with this project:
I decided to build KCert with a web interface from the start. In fact, KCert does not currently do anything in the background besides running the HTTP service. Although I will be adding it at some point, there is no automatic renewal of certificates.
This has made developing KCert and debugging quite simple. I can run it locally and click through every page of the UI to make sure things are working as expected. When the code gets more stable, I plan to add a background job for automatic renewals. But getting the manual scenario working first will make that a fairly simple step.
Kubernetes applications traditionally get their configurations from environment variables. KCert has many such configurations. However, I also opted to add a configuration page in the UI for some of the options. This allows the user to make some configuration changes without restarting the service. My decision process around deciding which options would be controlled in the UI was:
I decided to design the certificate manager assuming there is only instance running. Kubernetes easily allows you to deploy multiple parallel instances, but does a certificate manager really need that type of scalability? I would argue no.
Let's Encrypt certificates are valid for 90 days and only need to be renewed once every 60 days. This means you have a 30 day window to renew a certificate before it expires. So does it really matter if your certificate manager is down for a couple of minutes? How many domains/certs would be too much for a single instance to handle?
Limiting the deployment to one instance makes debugging the service much easier. There's only one pod and you can easily connect to it and watch the logs.
I've been running this tool in my own cluster since December. I love the simplicity and have really enjoyed figuring out how to do all this without the complexity of CertManager. There are however several things that I'd still like to do:
I'm still experimenting with different ideas, the code and design is likely to change a lot. For this reason, I don't recommend that anyone try to use it unless they'd like to contribute to the code. I do however expect officially release this tool at some point. Maybe in a few more months?