Kubernetes Engine (OKE)

Kubernetes (abbreviated k8s) is a container orchestration system originally developed by Google. It provides an interface for running software that abstracts away the fact that the software runs on multiple machines. It also allows you do this completely declaratively, which means that instead of describing a list of actions to it (like a computer program), you instead tell it what you want the end goal to be (e.x. there should always be 3 copies of this software running). We think these properties make it a good level of abstraction to build an application platform on top of.

Our current Kubernetes deployment was originally envisioned by ~njha and ~fydai, who were frustrated with the state of the existing Kubernetes cluster (which was on a single machine, and had no benefit over running a container runtime like Podman directly). Over time ~etw and ~oliverni joined the project (and ~fydai graduated). A number of others also briefly helped along the way! Finally in late 2022 - early 2023 we had something we were pretty happy with.

Tech Stack¶

Container Runtime (CRI): CRI-O
Well-supported and stable for pretty much everything that runs on Kubernetes.
Version is tied to Kubernetes version, so we're actually incentivised to keep this up to date.
containerd has weird bugs in the default installation, so much so that other people still (as of Kube 1.21) run dockershim in production (update 2022: containerd + rook is broken on NixOS)
Container Networking (CNI): Cilium
This is responsible for the underlying transport between all containers, and between containers and the internet.
Cilium is the most advanced CNI plugin available as of now. (eBPF, HTTP Filtering, etc)
Cluster DNS: CoreDNS
This provides DNS that understands Kubernetes, allowing you to do things like GET http://pod-a.namespace from other pods.
LoadBalancer: MetalLB
A LoadBalancer allows Kubernetes services to directly bind themselves to Public IPs.
See the IP Address Allocation sheet to see what IPs we have allocated for cluster use.
MetalLB is one of two LoadBalancer solutions (the other being kube-vip, which we use for HA control plane) for bare metal, and it's worked without any hassle so far.
Ingress: Cilium
An Ingress allows assigning domains to Services, and routes traffic sent to those domains to the appropriate pods.
We originally considered Traefik, but Traefik is very difficult to debug. After breaking Traefik for the third time, we gave up and went with Contour. Then we took so long to build up the cluster that Cilium launched an Ingress controller. It was very broken so njha made some PRs and now it works okay.
TLS Certificates: cert-manager
Just an ACME client for Let's Encrypt.
It retrieves and constantly updates the TLS certificates that the Ingress needs.
Storage: Ceph
Rook runs a Ceph cluster inside Kubernetes. Right now it has OSDs on jaws and lockdown.
Rook also provides PersistentVolumeClaims to Kubernetes, so Pods can store things.
There are also plans to expose an S3-compatible API to consume by e.x. Argo Workflows to store outputs of CI runs, or Mastodon to store images, etc.
Secrets: Vault, vault-secrets-operator
Vault stores secrets (hopefully securely) in a key-value store.
vault-secrets-operator listens for VaultSecret objects, which sync the data inside Vault into regular Kubernetes secrets. The content of these secrets can then be used inside ConfigMaps, or templated into files to be used inside pods.
CI: TODO
CD: Argo CD
This is a tool that keeps the state of the cluster up to date with the current state of git.
Everything should be deployed via ArgoCD, see the Usage section for more information.
Backups: TODO

Old Kubernetes¶

There was an old deployment of Kubernetes before the current one. It suffered from a number of problems that resulted in us deciding to scrap it and start over rather than try and fix it.

Deployed through Puppet w/o good upgrade story
Deployed on a single physical machine, with 3 VMs as control-plane nodes
Secret management via hostPath puppet shares (requires puppet-trigger on all workers to update secrets)
Git-ops via Jenkins and ocf-kubernetes-deploy (Shopify krane)
No support for Service type = LoadBalancer (based on the erroneous belief that MetalLB requires additional network configuration)
Older, less performant choice of container networking
Storage via NFS on a single server instead of distributed via Ceph
No centralized metrics/logs, no support for otel tracing, nothing is instrumented
Using Docker as the container runtime
Control plane not highly available (all traffic goes to a single control plane node)