Disable kubernetes-etcd anchor cleanup in gates

Interesting gate failure:
* kubernetes-etcd chart is installed
* kubernetes-etcd-anchor pod creates a new kubernetes-etcd manifest
* kubernetes-etcd pod restarts
* an etcd leader election happens, triggering a tiller failure
* tiller tries to purge/delete the chart
* the kubernetes-etcd-anchor can't terminate, because the preStop gets
stuck in a loop trying to talk to etcd via the service endpoint, and the
termination grace period is 3600s

This change just takes the approach of disabling the cleanup for the
kubernetes etcd anchor pod.

An alternative fix is to change the grace period to something shorter.
However, at this point, the haproxy anchor and kube-apiserver anchor
pods have done their jobs, so kube-apiserver is talking to etcd via
haproxy, and haproxy only knows about the kubernetes-etcd pod, not the
auxiliary etcd pods. It is likely that the kubernetes-etcd anchor would
restart and spin up a new kubernetes etcd pod in time, but it may
occasionally fail.

Change-Id: Ifa71394b2f87e227a6c4ad1b4c80900cec6f5684
This commit is contained in:
Phil Sphicas 2021-02-13 06:45:59 +00:00
parent 77c762463b
commit a57158d0e9
2 changed files with 2 additions and 0 deletions

View File

@ -857,6 +857,7 @@ data:
values:
anchor:
etcdctl_endpoint: kubernetes-etcd.kube-system.svc.cluster.local
enable_cleanup: false
labels:
anchor:
node_selector_key: kubernetes-etcd

View File

@ -863,6 +863,7 @@ data:
values:
anchor:
etcdctl_endpoint: kubernetes-etcd.kube-system.svc.cluster.local
enable_cleanup: false
labels:
anchor:
node_selector_key: kubernetes-etcd