drydock/drydock_provisioner/orchestrator
Scott Hussey ae87cd1714 Update image and chart mgmt
NOTE: This has become a monolithic commit to get gate
      settings/scripts in place for CI

- Add Makefile with UCP standard entrypoints
- Move Dockerfile into images/drydock per UCP standards
- Add values.yaml entries for uWSGI threads and workers
- Add environment variables to chart Deployment manifest
  for uWSGI thread and workers
- Add threads and workers specification to uWSGI commandline
  in entrypoint
- Test that the Drydock API is responding
- Test that the Drydock API rejects noauth requests
- Fix Makefile utility script to work behind a proxy

Correct task success voting

Some tasks were incorrectly considered partial_success even when
no failure occurred.

- Network configuration erroneously marked messages as errors
- Update result propagation logic to only use the latest retry

The deploy_nodes task ended as incomplete due to a missing
subtask assignment

Also added a node check step to prepare_nodes so that nodes that
are already under provisioner control (MaaS) are not IPMI-rebooted.

Tangential changes:
- added config item to for leadership claim interval
- added some debug logging to bootaction_report task
- fix tasks list API endpoint to generate valid JSON

Improve task concurrency

When tasks are started with a scope of multiple nodes,
split the main task so each node is managed independently
to de-link the progression of nodes.

- Split the prepare_nodes task
- Begin reducing cyclomatic complexity to allow for
  better unit testing
- Improved tox testing to include coverage by default
- Include postgresql integration tests in coverage

Closes #73

Change-Id: I600c2a4db74dd42e809bc3ee499fb945ebdf31f6
2017-12-15 15:33:14 -06:00
..
actions Update image and chart mgmt 2017-12-15 15:33:14 -06:00
validations Rational Boot Storage: Drydock validator 2017-12-14 09:31:55 -05:00
__init__.py Refactor orchestrator 2017-10-26 15:00:39 -05:00
orchestrator.py Update image and chart mgmt 2017-12-15 15:33:14 -06:00
readme.md DRYD-50 Drydock support of NIC bonding 2017-09-21 10:29:39 -05:00
util.py Rational Boot Storage: Drydock validator 2017-12-14 09:31:55 -05:00

readme.md

Orchestrator

The orchestrator is the core of drydock and will manage the ordering of driver actions to implement the main Drydock actions. Each of these actions will be started by the external cLCP orchestrator with different parameters to control the scope of the action.

Orchestrator should persist the state of each task such that on failure the task can retried and only the steps needed will be executed.

Drydock Tasks

Bullet points listed below are not exhaustive and will change as we move through testing

ValidateDesign

Load design data from the statemgmt persistent store and validate that the current state of design data represents a valid site design. No claim is made that the design data is compatible with the physical state of the site.

Validations

  • Networking
    • No static IP assignments are duplicated
    • No static IP assignments are outside of the network they are targetted for
    • All IP assignments are within declared ranges on the network
    • No network is allowed on multiple network links
    • Network MTU is equal or less than NetworkLink MTU
    • MTU values are sane
    • NetworkLink bond mode is compatible with other bond options
    • NetworkLink with more than one allowed network supports trunking
  • Storage
    • Boot drive is above minimum size
    • Root drive is above minimum size
    • No physical device specifies a target VG and a partition list
    • No partition specifies a target VG and a filesystem
    • All defined VGs have at least one defined PV (partition or physical device)
    • Partition and LV sizing is sane
      • Percentages don't sum to above 100%
      • If percentages sum to 100%, no other partitions or LVs are defined
  • Node
    • Root filesystem is defined on a partition or LV
    • Networks assigned to each node's interface are within the set of of the attached link's allowed_networks
    • Inter

VerifySite

Verify site-wide resources are in a useful state

  • Driver downstream resources are reachable (e.g. MaaS)
  • OS images needed for bootstrapping are available
  • Promenade or other next-step services are up and available
  • Verify credentials are available

PrepareSite

Begin preparing site-wide resources for bootstrapping. This action will lock site design data for changes.

  • Configure bootstrapper with site network configs
  • Shuffle images so they are correctly configured for bootstrapping

VerifyNode

Verification of per-node configurations within the context of the current node status

  • Status: Present
    • Basic hardware verification as available via OOB driver
      • BIOS firmware
      • PCI layout
      • Drives
      • Hardware alarms
    • IPMI connectivity
  • Status: Prepared
    • Full hardware manifest
    • Possibly network connectivity
    • Firmware versions

PrepareNode

Prepare a node for bootstrapping

  • Configure network port for PXE
  • Configure a node for PXE boot
  • Power-cycle the node
  • Setup commissioning configuration
    • Hardware drivers
    • Hardware configuration (e.g. RAID)
  • Configure node networking
  • Configure node storage
  • Interrogate node
    • lshw output
    • lldp output

DeployNode

Begin bootstrapping the node and monitor success

  • Initialize the Introspection service for the node
  • Bootstrap the node (i.e. Write persistent OS install)
  • Ensure network port is returned to production configuration
  • Reboot node from local disk
  • Monitor platform bootstrapping

DestroyNode

Destroy current node configuration and rebootstrap from scratch

Integration with Drivers

Based on the requested task and the current known state of a node the orchestrator will call the enabled downstream drivers with one or more tasks. Each call will provide the driver with the desired state (the applied model) and current known state (the build model).