Merge "Doc updates for install and troubleshooting"

This commit is contained in:
Bryan Strassner 2018-01-24 17:40:15 -05:00 committed by Gerrit Code Review
commit 184f1e2ea8
4 changed files with 127 additions and 10 deletions

View File

@ -1,21 +1,37 @@
Getting Started
===============
Note: This document is meant to give a general understanding of how Promenade
could be exercised in a development environment or for general learning and
understanding. For holistic UCP deployment procedures, refer to `Treasuremap <https://github.com/att-comdev/treasuremap>`_
Basic Deployment
----------------
This approach is quick to get started, but generates the scripts used for
joining up-front rather than generating them in the API as needed.
Setup
^^^^^
Setup Build Machine
^^^^^^^^^^^^^^^^^^^
On the machine you wish to use to generate deployment files, install docker:
.. code-block:: console
sudo apt -y install docker.io
This can be the same machine you intend to be the Genesis host, or it may be
a separate build machine.
Generate Build files
^^^^^^^^^^^^^^^^^^^^
To create the certificates and scripts needed to perform a basic deployment,
you can use the following helper script:
you can use the following helper script on your build machine:
.. code-block:: bash
.. code-block:: console
./tools/simple-deployment.sh examples/basic build
sudo ./tools/simple-deployment.sh examples/basic build
This will copy the configuration provided in the ``examples/basic`` directory
into the ``build`` directory. Then, it will generate self-signed certificates
@ -23,18 +39,31 @@ for all the needed components in Deckhand-compatible format. Finally, it will
render the provided configuration into directly-usable ``genesis.sh`` and
``join-<NODE>.sh`` scripts.
Genesis Host Provision
^^^^^^^^^^^^^^^^^^^^^^
Install Ubuntu 16.04 on the machine intended to be the genesis host. Ensure
the host has outbound internet access and DNS resolution.
Ensure that the hostname matches the hostname specified in the Genesis.yaml
file used to build the above configurations.
Execution
^^^^^^^^^
Perform the following steps to execute the deployment:
1. Copy the ``genesis.sh`` script to the genesis node and run it.
1. Copy the ``genesis.sh`` script to the genesis node and run it as sudo. In the
event of runtime errors, refer to :doc:`troubleshooting/genesis`
2. Validate the genesis node by running ``validate-genesis.sh`` on it.
3. Join master nodes by copying their respective ``join-<NODE>.sh`` scripts to
3. Nodes for which ``join-<NODE>.sh`` scripts have been generated should be
provisioned at this point, and need to have network connectivity to the
genesis node. (This could be a manual Ubuntu provision, or a Drydock-
initiated PXE boot in the case of a full fledged UCP deployment).
4. Join master nodes by copying their respective ``join-<NODE>.sh`` scripts to
them and running them.
4. Validate the master nodes by copying and running their respective
5. Validate the master nodes by copying and running their respective
``validate-<NODE>.sh`` scripts on each of them.
5. Re-provision the Genesis node
6. Re-provision the Genesis node
a) Run the ``/usr/local/bin/promenade-teardown`` script on the Genesis node:
b) Delete the node from the cluster via one of the other nodes ``kubectl delete node <GENESIS>``.
@ -42,7 +71,7 @@ Perform the following steps to execute the deployment:
d) Join the genesis node as a normal node using its ``join-<GENESIS>.sh`` script.
e) Validate the node using ``validate-<GENSIS>.sh``.
6. Join and validate all remaining nodes using the ``join-<NODE>.sh`` and
7. Join and validate all remaining nodes using the ``join-<NODE>.sh`` and
``validate-<NODE>.sh`` scripts described above.

View File

@ -33,4 +33,5 @@ Promenade Configuration Guide
design
getting-started
configuration/index
troubleshooting/index
api

View File

@ -0,0 +1,78 @@
Genesis Troubleshooting
=======================
genesis.sh
----------
Kubernetes services failures
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Before the Armada manifests are applied, the genesis.sh script will bring basic
kubernetes services online by starting docker containers for these services.
One of the first services to be brought up is the kubernetes API. If it fails to
come up, you may see a repeated error as follows from the genesis.sh script:
.. code-block:: console
.The connection to the server apiserver.kubernetes.promenade:6443 was
refused - did you specify the right host or port?
Check that the hostname in your Genesis.yaml matches the hostname of the
machine you are trying to install onto. If they do not match, change one to
match the other. If you change Genesis.yaml, then re-generate the Promenade
payloads.
If the hostnames match, check the container logs under /var/log/pods to see the
reason for the provisioning failure. (``kubectl logs`` function will not be
available if the API container is not running).
Armada failures
^^^^^^^^^^^^^^^
When executing genesis.sh, you may encounter failures from Armada in the
provisioning of other containers. For example:
.. code-block:: console
CRITICAL armada [-] Unhandled error: armada.exceptions.tiller_exceptions.ReleaseException: Failed to Install release: barbican
Use ``kubectl logs`` on the failed pod to determine the reason for the failure.
E.g.:
.. code-block:: console
sudo kubectl logs barbican-api-5b8bccdf8f-x7sld --namespace=ucp
Other errors may point to configuration errors. For example:
.. code-block:: console
CRITICAL armada [-] Unhandled error: armada.exceptions.source_exceptions.GitLocationException: master is not a valid git repository.
In this case, the git branch name was inadvertently substituted for the git URL
in one of the chart definitions in ``bootstrap-armada.yaml``.
Post-run failures
^^^^^^^^^^^^^^^^^
At its conclusion, the genesis script will output the list of containers
provisioned and their status, as reported by kubernetes. It is possible that
some containers may not be in a Running state. E.g.:
.. code-block:: console
ucp promenade-api-6696769cd-qwpzf 0/1 ImagePullBackOff 0 10h
For general failures, ``kubectl logs`` may be used as in the previous section.
In this case, it was necessary to run ``kubectl describe`` on the pod to get the
details of the image pull failure. E.g.:
.. code-block:: console
kubectl describe pod promenade-api-7dc54d47c-qw27m --namespace=ucp
In this particular incident report, the problem was a missing certificate on the
bare metal node which caused the image download to fail. Installing the
certificate, restarting the docker service, and then waiting for the container
to retry resolved this particular issue.

View File

@ -0,0 +1,9 @@
Troubleshooting
===============
.. toctree::
:maxdepth: 2
:caption: Troubleshooting
genesis