Disable kubernetes-etcd anchor cleanup in gates

Interesting gate failure: * kubernetes-etcd chart is installed * kubernetes-etcd-anchor pod creates a new kubernetes-etcd manifest * kubernetes-etcd pod restarts * an etcd leader election happens, triggering a tiller failure * tiller tries to purge/delete the chart * the kubernetes-etcd-anchor can't terminate, because the preStop gets stuck in a loop trying to talk to etcd via the service endpoint, and the termination grace period is 3600s This change just takes the approach of disabling the cleanup for the kubernetes etcd anchor pod. An alternative fix is to change the grace period to something shorter. However, at this point, the haproxy anchor and kube-apiserver anchor pods have done their jobs, so kube-apiserver is talking to etcd via haproxy, and haproxy only knows about the kubernetes-etcd pod, not the auxiliary etcd pods. It is likely that the kubernetes-etcd anchor would restart and spin up a new kubernetes etcd pod in time, but it may occasionally fail. Change-Id: Ifa71394b2f87e227a6c4ad1b4c80900cec6f5684
2021-02-13 06:45:59 +00:00 · 2021-02-13 06:45:59 +00:00 · a57158d0e9
parent 77c762463b
commit a57158d0e9
2 changed files with 2 additions and 0 deletions
--- a/examples/containerd/armada-resources.yaml
+++ b/examples/containerd/armada-resources.yaml
@ -857,6 +857,7 @@ data:
  values:
    anchor:
      etcdctl_endpoint: kubernetes-etcd.kube-system.svc.cluster.local
+      enable_cleanup: false
    labels:
      anchor:
        node_selector_key: kubernetes-etcd
--- a/examples/gate/armada-resources.yaml
+++ b/examples/gate/armada-resources.yaml
@ -863,6 +863,7 @@ data:
  values:
    anchor:
      etcdctl_endpoint: kubernetes-etcd.kube-system.svc.cluster.local
+      enable_cleanup: false
    labels:
      anchor:
        node_selector_key: kubernetes-etcd