dcac36c8cf
This adds a sleep to avoid a tight restart loop for etcd when running in bootstrap mode (e.g. to spin up etcd for calico). This doesn't seem to have manifested before, but I saw it while troubleshooting an environment yesterday, and I'm surprised it hasn't been seen before. The issue manifests as repeated teardown and replacement of the bootstrapping <svc>-etcd-<hostname> pod put in place by the anchor. The log messages in the etcd container of the pod will say that etcd is terminating because it got SIGTERM, and a large number of pause containers will be left behind and visible in `docker ps -a`. The constant pod replacement was racing with how quickly kubernetes would see the healthy (non-anchor) etcd pod allowing the anchor to be able to reach etcd over the kubernetes service to check its health. A successful health check by the anchor ends the bootstrapping phase, exiting the race. I'm confident there's a better approach to clean this section of code up; however, the concern with this PS is to address the problematic tight loop, allowing a more rigorous improvement to come later. Change-Id: I0e3181194cfcd376967672b47a5e126103b4dfe4 |
||
---|---|---|
charts | ||
doc/source | ||
etc/promenade | ||
examples | ||
promenade | ||
tests | ||
tools | ||
.dockerignore | ||
.gitignore | ||
.gitreview | ||
.zuul.yaml | ||
Dockerfile | ||
LICENSE | ||
Makefile | ||
README.md | ||
entrypoint.sh | ||
requirements-direct.txt | ||
requirements-frozen.txt | ||
requirements.txt | ||
setup.py | ||
test-requirements.txt | ||
tox.ini |
README.md
Promenade
Promenade is a tool for bootstrapping a resilient Kubernetes cluster and managing its life-cycle via Helm charts.
Documentation can be found here.
Roadmap
The detailed Roadmap can be viewed on the OpenStack StoryBoard.
- Cluster bootstrapping
- Initial Genesis process results in a single node Kubernetes cluster with Under-cloud components deployed using Armada.
- Joining sufficient master nodes results in a resilient Kubernetes cluster.
- Destroy Genesis node after bootstrapping and re-provision as a normal node to ensure consistency.
- Life-cycle management
- Decommissioning of nodes.
- Updating Kubernetes version.
Getting Started
To get started, see getting started.
Configuration is documented here.
Bugs
Bugs are tracked in OpenStack StoryBoard.