The existing liveness and readiness probes for kube-proxy are in need of
adjustment. The current implementation is exec-based, which can be a
resource concern, and is tied heavily to iptables, so is incompatible
with ipvs.
This change removes the exec-based liveness and readiness probes from
the kube-proxy daemonset, and replaces them with HTTP probes of the
healthz endpoint, following the direction that kubernetes seems to be
taking.[0][1]
The values.yaml interface to enable and disable the probes and set various
parameters is also modified to use the helm-toolkit standard snippet.[2]
Notably, the settings previously configurable under livenessProbe.config
are now under pod.probes.proxy.proxy.liveness.params.
0: https://github.com/kubernetes/kubernetes/issues/81630
1: https://github.com/kubernetes/kubernetes/pull/75323
2: https://opendev.org/openstack/openstack-helm-infra/src/branch/master/helm-toolkit/templates/snippets/_kubernetes_probes.tpl
Change-Id: I99ccbc2270a1f8a204417aa410868d04788dc60f
"wc -l foo" output has two columns causing subtle breakage that shows
up as sporadic cryptic errors at times
Change-Id: I1f708ed011a48a2fbca6af8f4d021005d2296bfd
This update makes it so list of services without endpoints detected on
the host must be static to cause failure.
This avoids race conditions for large deployments where new services are
being added over several minutes, and trigger probe failures.
Change-Id: Ie65c8613cb85bfdf61d41099540d3499ea1de817
This updates the liveness probe to fail when there are iptables rules
from kube-proxy that don't appear in existing endpoints.
Change-Id: I376be24566809a653417acfb84cac8f1c4e1a36e
In K8S version 1.10, the proxy can sometimes get stuck believing that
some services do not have any endpoints. This seems to be triggered by
network instability, though the proxy doesn't seem to recover on its
own, while bouncing the pod fixes the issue.
This change adds a naive means of detecting and recoverying from this
(`iptables-save | grep 'has no endpoints'` in the liveness probe) that
may occasionally have false positives. As such, the liveness probe is
configured very conservatively to avoid triggering CrashLoopBackoff in
the event of a false positive.
Finally, there is a whitelist feature to help avoid false positives for
services that are known to legitimately have empty endpoints during the
course of normal operation (e.g. Patroni might manage such an endpoint
list).
Change-Id: I29a770fab70b1fb79db59ef5408f40b2af1c01f9