A declarative framework for resilient Kubernetes deployment.
Go to file
Mark Burnett dcac36c8cf Fix: Avoid etcd bootstrap race
This adds a sleep to avoid a tight restart loop for etcd when running in
bootstrap mode (e.g. to spin up etcd for calico).

This doesn't seem to have manifested before, but I saw it while
troubleshooting an environment yesterday, and I'm surprised it hasn't
been seen before.

The issue manifests as repeated teardown and replacement of the
bootstrapping <svc>-etcd-<hostname> pod put in place by the anchor.  The
log messages in the etcd container of the pod will say that etcd is
terminating because it got SIGTERM, and a large number of pause
containers will be left behind and visible in `docker ps -a`.  The
constant pod replacement was racing with how quickly kubernetes would
see the healthy (non-anchor) etcd pod allowing the anchor to be able to
reach etcd over the kubernetes service to check its health.  A successful
health check by the anchor ends the bootstrapping phase, exiting the
race.

I'm confident there's a better approach to clean this section of code
up; however, the concern with this PS is to address the problematic
tight loop, allowing a more rigorous improvement to come later.

Change-Id: I0e3181194cfcd376967672b47a5e126103b4dfe4
2018-09-07 07:52:44 -05:00
charts Fix: Avoid etcd bootstrap race 2018-09-07 07:52:44 -05:00
doc/source Merge "Adding node-labels api" 2018-08-09 21:31:56 +00:00
etc/promenade Minor testing-related cleanup 2018-01-02 10:14:10 -06:00
examples Update tiller version to 2.10.0 2018-08-30 15:54:07 -05:00
promenade Merge "Handle non-true defaults" 2018-08-28 16:05:08 +00:00
tests Merge "Adding node-labels api" 2018-08-09 21:31:56 +00:00
tools Update tiller version to 2.10.0 2018-08-30 15:54:07 -05:00
.dockerignore Remove tests from images 2018-08-02 15:37:18 -05:00
.gitignore Adding node-labels api 2018-08-09 23:58:59 +05:30
.gitreview Update .gitreview for openstack infra 2018-05-17 19:25:48 +01:00
.zuul.yaml Consolidate pep8/bandit zuul gating 2018-08-21 12:57:02 -05:00
Dockerfile Update Dockerfile to allow override of FROM variable 2018-07-24 21:11:35 +00:00
LICENSE Initial commit 2017-02-14 11:13:39 -08:00
Makefile Update tiller version to 2.10.0 2018-08-30 15:54:07 -05:00
README.md Update the README.md File 2018-07-30 15:52:23 -05:00
entrypoint.sh [Fix] Allow larger headers in API requests 2018-03-01 09:30:39 -06:00
requirements-direct.txt (fix) Update deckhand dependency 2018-07-17 13:57:02 -05:00
requirements-frozen.txt Consolidate pep8/bandit zuul gating 2018-08-21 12:57:02 -05:00
requirements.txt Avoid directly installing non-frozen dependencies 2017-10-20 10:54:10 -05:00
setup.py Speed up image build 2018-04-25 12:00:06 -05:00
test-requirements.txt Consolidate pep8/bandit zuul gating 2018-08-21 12:57:02 -05:00
tox.ini Add venv tox environment 2018-08-24 21:33:40 +02:00

README.md

Promenade

Promenade is a tool for bootstrapping a resilient Kubernetes cluster and managing its life-cycle via Helm charts.

Documentation can be found here.

Roadmap

The detailed Roadmap can be viewed on the OpenStack StoryBoard.

  • Cluster bootstrapping
    • Initial Genesis process results in a single node Kubernetes cluster with Under-cloud components deployed using Armada.
    • Joining sufficient master nodes results in a resilient Kubernetes cluster.
    • Destroy Genesis node after bootstrapping and re-provision as a normal node to ensure consistency.
  • Life-cycle management
    • Decommissioning of nodes.
    • Updating Kubernetes version.

Getting Started

To get started, see getting started.

Configuration is documented here.

Bugs

Bugs are tracked in OpenStack StoryBoard.