fix: Enable rolling updates for the Caddy deployment in multi-node Kubernetes

When a Pod associated with a Deployment is updated (for example, due to a change to its ConfigMap, or an updated image reference), Kubernetes uses a ReplicaSet to spin up a Pod with the new configuration, and once it is up, it tears down the old one. In case of the Caddy Deployment, this is complicated by the fact that it uses a Persistent Volume Claim (PVC), whose corresponding volume uses a Read/Write-Once (RWO) configuration. This means that it can only be used by multiple Pods if all those Pods all run on the same Kubernetes worker node. In order to enable rolling upgrades for the Caddy Deployment, we need to ensure that its replacement Pod is scheduled on the same node as the original Pod. Thus, add a pod affinity rule that will force exactly that behavior. Reference: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/ The other Tutor services that use volumes (MySQL, Redis, Elasticsearch and MongoDB) do not need this fix, since they all use the "Recreate" deployment strategy: their Pods are all automatically torn down before being replaced. This strategy is not needed for Caddy, and using a pod affinity rule is less disruptive to the learner experience.
2025-02-15 07:01:39 +00:00 · 2022-05-10 14:38:39 +02:00 · 2022-05-10 14:38:39 +02:00 · 78424776b6
commit 78424776b6
parent 549922f0b9
2 changed files with 17 additions and 0 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -19,6 +19,7 @@ Every user-facing change should have an entry in this changelog. Please respect
 ## Unreleased

 - [Fix] Fix 500 error during studio login. (by @regisb)
+- [Fix] Fix updates for the Caddy deployment in multi-node Kubernetes clusters (#660). Previously, Caddy configuration updates might fail if the Kubernetes cluster had more than one worker node. (by @fghaas)

 ## v13.2.2 (2022-05-06)

--- a/tutor/templates/k8s/deployments.yml
+++ b/tutor/templates/k8s/deployments.yml
@ -14,6 +14,22 @@ spec:
      labels:
        app.kubernetes.io/name: caddy
    spec:
+      {%- if ENABLE_WEB_PROXY %}
+      # This Deployment uses a persistent volume claim. This requires
+      # that in order to enable rolling updates (i.e. use a deployment
+      # strategy other than Replace), we schedule the new Pod to the
+      # same node as the original Pod.
+      affinity:
+        podAffinity:
+          requiredDuringSchedulingIgnoredDuringExecution:
+            - labelSelector:
+                matchExpressions:
+                - key: app.kubernetes.io/name
+                  operator: In
+                  values:
+                    - caddy
+              topologyKey: "kubernetes.io/hostname"
+      {%- endif %}
      containers:
        - name: caddy
          image: {{ DOCKER_IMAGE_CADDY }}