summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorBryan Strassner <bryan.strassner@gmail.com>2018-08-01 13:16:47 -0500
committerBryan Strassner <bryan.strassner@gmail.com>2018-08-01 13:19:28 -0500
commitbfbfd56c8127f0b167ff69e49426f137b01a73b0 (patch)
tree182138d22721c1caad5e3751cef077c30156f02b
parent6e0a18e7fa572caebf3385e30b11768ca873022d (diff)
Move specs: airship-in-a-bottle to airship-specs
Moves the two blueprint/spec documents that existed in airship-in-a-bottle to the airship-specs. The implemented spec was not reformatted to the spec template. The other spec (in approved folder) was minimally updated to the spec template. Change-Id: I7468579e2fa3077ee1144e5294eba97d8e4ced05
Notes
Notes (review): Code-Review+2: Felipe Monteiro <felipe.monteiro@att.com> Code-Review+2: Scott Hussey <sthussey@att.com> Workflow+1: Scott Hussey <sthussey@att.com> Verified+2: Zuul Submitted-by: Zuul Submitted-at: Thu, 02 Aug 2018 17:21:25 +0000 Reviewed-on: https://review.openstack.org/587945 Project: openstack/airship-specs Branch: refs/heads/master
-rw-r--r--specs/approved/workflow_node-teardown.rst620
-rw-r--r--specs/implemented/deployment-grouping-baremetal.rst569
2 files changed, 1189 insertions, 0 deletions
diff --git a/specs/approved/workflow_node-teardown.rst b/specs/approved/workflow_node-teardown.rst
new file mode 100644
index 0000000..21f1779
--- /dev/null
+++ b/specs/approved/workflow_node-teardown.rst
@@ -0,0 +1,620 @@
1..
2 Copyright 2018 AT&T Intellectual Property.
3 All Rights Reserved.
4
5 Licensed under the Apache License, Version 2.0 (the "License"); you may
6 not use this file except in compliance with the License. You may obtain
7 a copy of the License at
8
9 http://www.apache.org/licenses/LICENSE-2.0
10
11 Unless required by applicable law or agreed to in writing, software
12 distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
13 WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
14 License for the specific language governing permissions and limitations
15 under the License.
16
17.. index::
18 single: Teardown node
19 single: workflow;redeploy_server
20 single: Drydock
21 single: Promenade
22 single: Shipyard
23
24
25.. _node-teardown:
26
27=====================
28Airship Node Teardown
29=====================
30
31Shipyard is the entrypoint for Airship actions, including the need to redeploy a
32server. The first part of redeploying a server is the graceful teardown of the
33software running on the server; specifically Kubernetes and etcd are of
34critical concern. It is the duty of Shipyard to orchestrate the teardown of the
35server, followed by steps to deploy the desired new configuration. This design
36covers only the first portion - node teardown
37
38
39Links
40=====
41
42None
43
44Problem description
45===================
46
47When redeploying a physical host (server) using the Airship Platform,
48it is necessary to trigger a sequence of steps to prevent undesired behaviors
49when the server is redeployed. This blueprint intends to document the
50interaction that must occur between Airship components to teardown a server.
51
52Impacted components
53===================
54
55Drydock
56Promenade
57Shipyard
58
59Proposed change
60===============
61
62Shipyard node teardown Process
63~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
64#. (Existing) Shipyard receives request to redeploy_server, specifying a target
65 server.
66#. (Existing) Shipyard performs preflight, design reference lookup, and
67 validation steps.
68#. (New) Shipyard invokes Promenade to decommission a node.
69#. (New) Shipyard invokes Drydock to destroy the node - setting a node
70 filter to restrict to a single server.
71#. (New) Shipyard invokes Promenade to remove the node from the Kubernetes
72 cluster.
73
74Assumption:
75node_id is the hostname of the server, and is also the identifier that both
76Drydock and Promenade use to identify the appropriate parts - hosts and k8s
77nodes. This convention is set by the join script produced by promenade.
78
79Drydock Destroy Node
80--------------------
81The API/interface for destroy node already exists. The implementation within
82Drydock needs to be developed. This interface will need to accept both the
83specified node_id and the design_id to retrieve from Deckhand.
84
85Using the provided node_id (hardware node), and the design_id, Drydock will
86reset the hardware to a re-provisionable state.
87
88By default, all local storage should be wiped (per datacenter policy for
89wiping before re-use).
90
91An option to allow for only the OS disk to be wiped should be supported, such
92that other local storage is left intact, and could be remounted without data
93loss. e.g.: --preserve-local-storage
94
95The target node should be shut down.
96
97The target node should be removed from the provisioner (e.g. MaaS)
98
99Responses
100~~~~~~~~~
101The responses from this functionality should follow the pattern set by prepare
102nodes, and other Drydock functionality. The Drydock status responses used for
103all async invocations will be utilized for this functionality.
104
105Promenade Decommission Node
106---------------------------
107Performs steps that will result in the specified node being cleanly
108disassociated from Kubernetes, and ready for the server to be destroyed.
109Users of the decommission node API should be aware of the long timeout values
110that may occur while awaiting promenade to complete the appropriate steps.
111At this time, Promenade is a stateless service and doesn't use any database
112storage. As such, requests to Promenade are synchronous.
113
114.. code:: json
115
116 POST /nodes/{node_id}/decommission
117
118 {
119 rel : "design",
120 href: "deckhand+https://{{deckhand_url}}/revisions/{{revision_id}}/rendered-documents",
121 type: "application/x-yaml"
122 }
123
124Such that the design reference body is the design indicated when the
125redeploy_server action is invoked through Shipyard.
126
127Query Parameters:
128
129- drain-node-timeout: A whole number timeout in seconds to be used for the
130 drain node step (default: none). In the case of no value being provided,
131 the drain node step will use its default.
132- drain-node-grace-period: A whole number in seconds indicating the
133 grace-period that will be provided to the drain node step. (default: none).
134 If no value is specified, the drain node step will use its default.
135- clear-labels-timeout: A whole number timeout in seconds to be used for the
136 clear labels step. (default: none). If no value is specified, clear labels
137 will use its own default.
138- remove-etcd-timeout: A whole number timeout in seconds to be used for the
139 remove etcd from nodes step. (default: none). If no value is specified,
140 remove-etcd will use its own default.
141- etcd-ready-timeout: A whole number in seconds indicating how long the
142 decommission node request should allow for etcd clusters to become stable
143 (default: 600).
144
145Process
146~~~~~~~
147Acting upon the node specified by the invocation and the design reference
148details:
149
150#. Drain the Kubernetes node.
151#. Clear the Kubernetes labels on the node.
152#. Remove etcd nodes from their clusters (if impacted).
153 - if the node being decommissioned contains etcd nodes, Promenade will
154 attempt to gracefully have those nodes leave the etcd cluster.
155#. Ensure that etcd cluster(s) are in a stable state.
156 - Polls for status every 30 seconds up to the etcd-ready-timeout, or the
157 cluster meets the defined minimum functionality for the site.
158 - A new document: promenade/EtcdClusters/v1 that will specify details about
159 the etcd clusters deployed in the site, including: identifiers,
160 credentials, and thresholds for minimum functionality.
161 - This process should ignore the node being torn down from any calculation
162 of health
163#. Shutdown the kubelet.
164 - If this is not possible because the node is in a state of disarray such
165 that it cannot schedule the daemonset to run, this step may fail, but
166 should not hold up the process, as the Drydock dismantling of the node
167 will shut the kubelet down.
168
169Responses
170~~~~~~~~~
171All responses will be form of the Airship Status response.
172
173- Success: Code: 200, reason: Success
174
175 Indicates that all steps are successful.
176
177- Failure: Code: 404, reason: NotFound
178
179 Indicates that the target node is not discoverable by Promenade.
180
181- Failure: Code: 500, reason: DisassociateStepFailure
182
183 The details section should detail the successes and failures further. Any
184 4xx series errors from the individual steps would manifest as a 500 here.
185
186Promenade Drain Node
187--------------------
188Drain the Kubernetes node for the target node. This will ensure that this node
189is no longer the target of any pod scheduling, and evicts or deletes the
190running pods. In the case of notes running DaemonSet manged pods, or pods
191that would prevent a drain from occurring, Promenade may be required to provide
192the `ignore-daemonsets` option or `force` option to attempt to drain the node
193as fully as possible.
194
195By default, the drain node will utilize a grace period for pods of 1800
196seconds and a total timeout of 3600 seconds (1 hour). Clients of this
197functionality should be prepared for a long timeout.
198
199.. code:: json
200
201 POST /nodes/{node_id}/drain
202
203Query Paramters:
204
205- timeout: a whole number in seconds (default = 3600). This value is the total
206 timeout for the kubectl drain command.
207- grace-period: a whole number in seconds (default = 1800). This value is the
208 grace period used by kubectl drain. Grace period must be less than timeout.
209
210.. note::
211
212 This POST has no message body
213
214Example command being used for drain (reference only)
215`kubectl drain --force --timeout 3600s --grace-period 1800 --ignore-daemonsets --delete-local-data n1`
216https://git.openstack.org/cgit/openstack/airship-promenade/tree/promenade/templates/roles/common/usr/local/bin/promenade-teardown
217
218Responses
219~~~~~~~~~
220All responses will be form of the Airship Status response.
221
222- Success: Code: 200, reason: Success
223
224 Indicates that the drain node has successfully concluded, and that no pods
225 are currently running
226
227- Failure: Status response, code: 400, reason: BadRequest
228
229 A request was made with parameters that cannot work - e.g. grace-period is
230 set to a value larger than the timeout value.
231
232- Failure: Status response, code: 404, reason: NotFound
233
234 The specified node is not discoverable by Promenade
235
236- Failure: Status response, code: 500, reason: DrainNodeError
237
238 There was a processing exception raised while trying to drain a node. The
239 details section should indicate the underlying cause if it can be
240 determined.
241
242Promenade Clear Labels
243----------------------
244Removes the labels that have been added to the target kubernetes node.
245
246.. code:: json
247
248 POST /nodes/{node_id}/clear-labels
249
250Query Parameters:
251
252- timeout: A whole number in seconds allowed for the pods to settle/move
253 following removal of labels. (Default = 1800)
254
255.. note::
256
257 This POST has no message body
258
259Responses
260~~~~~~~~~
261All responses will be form of the UCP Status response.
262
263- Success: Code: 200, reason: Success
264
265 All labels have been removed from the specified Kubernetes node.
266
267- Failure: Code: 404, reason: NotFound
268
269 The specified node is not discoverable by Promenade
270
271- Failure: Code: 500, reason: ClearLabelsError
272
273 There was a failure to clear labels that prevented completion. The details
274 section should provide more information about the cause of this failure.
275
276Promenade Remove etcd Node
277~~~~~~~~~~~~~~~~~~~~~~~~~~
278Checks if the node specified contains any etcd nodes. If so, this API will
279trigger that etcd node to leave the associated etcd cluster::
280
281 POST /nodes/{node_id}/remove-etcd
282
283 {
284 rel : "design",
285 href: "deckhand+https://{{deckhand_url}}/revisions/{{revision_id}}/rendered-documents",
286 type: "application/x-yaml"
287 }
288
289Query Parameters:
290
291- timeout: A whole number in seconds allowed for the removal of etcd nodes
292 from the targe node. (Default = 1800)
293
294Responses
295~~~~~~~~~
296All responses will be form of the UCP Status response.
297
298- Success: Code: 200, reason: Success
299
300 All etcd nodes have been removed from the specified node.
301
302- Failure: Code: 404, reason: NotFound
303
304 The specified node is not discoverable by Promenade
305
306- Failure: Code: 500, reason: RemoveEtcdError
307
308 There was a failure to remove etcd from the target node that prevented
309 completion within the specified timeout, or that etcd prevented removal of
310 the node because it would result in the cluster being broken. The details
311 section should provide more information about the cause of this failure.
312
313
314Promenade Check etcd
315~~~~~~~~~~~~~~~~~~~~
316Retrieves the current interpreted state of etcd.
317
318GET /etcd-cluster-health-statuses?design_ref={the design ref}
319
320Where the design_ref parameter is required for appropriate operation, and is in
321the same format as used for the join-scripts API.
322
323Query Parameters:
324
325- design_ref: (Required) the design reference to be used to discover etcd
326 instances.
327
328Responses
329~~~~~~~~~
330All responses will be form of the UCP Status response.
331
332- Success: Code: 200, reason: Success
333
334 The status of each etcd in the site will be returned in the details section.
335 Valid values for status are: Healthy, Unhealthy
336
337https://github.com/att-comdev/ucp-integration/blob/master/docs/source/api-conventions.rst#status-responses
338
339.. code:: json
340
341 { "...": "... standard status response ...",
342 "details": {
343 "errorCount": {{n}},
344 "messageList": [
345 { "message": "Healthy",
346 "error": false,
347 "kind": "HealthMessage",
348 "name": "{{the name of the etcd service}}"
349 },
350 { "message": "Unhealthy"
351 "error": false,
352 "kind": "HealthMessage",
353 "name": "{{the name of the etcd service}}"
354 },
355 { "message": "Unable to access Etcd"
356 "error": true,
357 "kind": "HealthMessage",
358 "name": "{{the name of the etcd service}}"
359 }
360 ]
361 }
362 ...
363 }
364
365- Failure: Code: 400, reason: MissingDesignRef
366
367 Returned if the design_ref parameter is not specified
368
369- Failure: Code: 404, reason: NotFound
370
371 Returned if the specified etcd could not be located
372
373- Failure: Code: 500, reason: EtcdNotAccessible
374
375 Returned if the specified etcd responded with an invalid health response
376 (Not just simply unhealthy - that's a 200).
377
378
379Promenade Shutdown Kubelet
380--------------------------
381Shuts down the kubelet on the specified node. This is accomplished by Promenade
382setting the label `promenade-decomission: enabled` on the node, which will
383trigger a newly-developed daemonset to run something like:
384`systemctl disable kubelet && systemctl stop kubelet`.
385This daemonset will effectively sit dormant until nodes have the appropriate
386label added, and then perform the kubelet teardown.
387
388.. code:: json
389
390 POST /nodes/{node_id}/shutdown-kubelet
391
392.. note::
393
394 This POST has no message body
395
396Responses
397~~~~~~~~~
398All responses will be form of the UCP Status response.
399
400- Success: Code: 200, reason: Success
401
402 The kubelet has been successfully shutdown
403
404- Failure: Code: 404, reason: NotFound
405
406 The specified node is not discoverable by Promenade
407
408- Failure: Code: 500, reason: ShutdownKubeletError
409
410 The specified node's kubelet fails to shutdown. The details section of the
411 status response should contain reasonable information about the source of
412 this failure
413
414Promenade Delete Node from Cluster
415----------------------------------
416Updates the Kubernetes cluster, removing the specified node. Promenade should
417check that the node is drained/cordoned and has no labels other than
418`promenade-decomission: enabled`. In either of these cases, the API should
419respond with a 409 Conflict response.
420
421.. code:: json
422
423 POST /nodes/{node_id}/remove-from-cluster
424
425.. note::
426
427 This POST has no message body
428
429Responses
430~~~~~~~~~
431All responses will be form of the UCP Status response.
432
433- Success: Code: 200, reason: Success
434
435 The specified node has been removed from the Kubernetes cluster.
436
437- Failure: Code: 404, reason: NotFound
438
439 The specified node is not discoverable by Promenade
440
441- Failure: Code: 409, reason: Conflict
442
443 The specified node cannot be deleted due to checks that the node is
444 drained/cordoned and has no labels (other than possibly
445 `promenade-decomission: enabled`).
446
447- Failure: Code: 500, reason: DeleteNodeError
448
449 The specified node cannot be removed from the cluster due to an error from
450 Kubernetes. The details section of the status response should contain more
451 information about the failure.
452
453
454Shipyard Tag Releases
455---------------------
456Shipyard will need to mark Deckhand revisions with tags when there are
457successful deploy_site or update_site actions to be able to determine the last
458known good design. This is related to issue 16 for Shipyard, which utilizes the
459same need.
460
461.. note::
462
463 Repeated from https://github.com/att-comdev/shipyard/issues/16
464
465 When multiple configdocs commits have been done since the last deployment,
466 there is no ready means to determine what's being done to the site. Shipyard
467 should reject deploy site or update site requests that have had multiple
468 commits since the last site true-up action. An option to override this guard
469 should be allowed for the actions in the form of a parameter to the action.
470
471 The configdocs API should provide a way to see what's been changed since the
472 last site true-up, not just the last commit of configdocs. This might be
473 accommodated by new deckhand tags like the 'commit' tag, but for
474 'site true-up' or similar applied by the deploy and update site commands.
475
476The design for issue 16 includes the bare-minimum marking of Deckhand
477revisions. This design is as follows:
478
479Scenario
480~~~~~~~~
481Multiple commits occur between site actions (deploy_site, update_site) - those
482actions that attempt to bring a site into compliance with a site design.
483When this occurs, the current system of being able to only see what has changed
484between committed and the the buffer versions (configdocs diff) is insufficient
485to be able to investigate what has changed since the last successful (or
486unsuccessful) site action.
487To accommodate this, Shipyard needs several enhancements.
488
489Enhancements
490~~~~~~~~~~~~
491
492#. Deckhand revision tags for site actions
493
494 Using the tagging facility provided by Deckhand, Shipyard will tag the end
495 of site actions.
496 Upon completing a site action successfully tag the revision being used with
497 the tag site-action-success, and a body of dag_id:<dag_id>
498
499 Upon completion of a site action unsuccessfully, tag the revision being used
500 with the tag site-action-failure, and a body of dag_id:<dag_id>
501
502 The completion tags should only be applied upon failure if the site action
503 gets past document validation successfully (i.e. gets to the point where it
504 can start making changes via the other UCP components)
505
506 This could result in a single revision having both site-action-success and
507 site-action-failure if a later re-invocation of a site action is successful.
508
509#. Check for intermediate committed revisions
510
511 Upon running a site action, before tagging the revision with the site action
512 tag(s), the dag needs to check to see if there are committed revisions that
513 do not have an associated site-action tag. If there are any committed
514 revisions since the last site action other than the current revision being
515 used (between them), then the action should not be allowed to proceed (stop
516 before triggering validations). For the calculation of intermediate
517 committed revisions, assume revision 0 if there are no revisions with a
518 site-action tag (null case)
519
520 If the action is invoked with a parameter of
521 allow-intermediate-commits=true, then this check should log that the
522 intermediate committed revisions check is being skipped and not take any
523 other action.
524
525#. Support action parameter of allow-intermediate-commits=true|false
526
527 In the CLI for create action, the --param option supports adding parameters
528 to actions. The parameters passed should be relayed by the CLI to the API
529 and ultimately to the invocation of the DAG. The DAG as noted above will
530 check for the presense of allow-intermediate-commits=true. This needs to be
531 tested to work.
532
533#. Shipyard needs to support retrieving configdocs and rendered documents for
534 the last successful site action, and last site action (successful or not
535 successful)
536
537 --successful-site-action
538 --last-site-action
539 These options would be mutually exclusive of --buffer or --committed
540
541#. Shipyard diff (shipyard get configdocs)
542
543 Needs to support an option to do the diff of the buffer vs. the last
544 successful site action and the last site action (succesful or not
545 successful).
546
547 Currently there are no options to select which versions to diff (always
548 buffer vs. committed)
549
550 support:
551 --base-version=committed | successful-site-action | last-site-action (Default = committed)
552 --diff-version=buffer | committed | successful-site-action | last-site-action (Default = buffer)
553
554 Equivalent query parameters need to be implemented in the API.
555
556Because the implementation of this design will result in the tagging of
557successful site-actions, Shipyard will be able to determine the correct
558revision to use while attempting to teardown a node.
559
560If the request to teardown a node indicates a revision that doesn't exist, the
561command to do so (e.g. redeploy_server) should not continue, but rather fail
562due to a missing precondition.
563
564The invocation of the Promenade and Drydock steps in this design will utilize
565the appropriate tag based on the request (default is successful-site-action) to
566determine the revision of the Deckhand documents used as the design-ref.
567
568Shipyard redeploy_server Action
569-------------------------------
570The redeploy_server action currently accepts a target node. Additional
571supported parameters are needed:
572
573#. preserve-local-storage=true which will instruct Drydock to only wipe the
574 OS drive, and any other local storage will not be wiped. This would allow
575 for the drives to be remounted to the server upon re-provisioning. The
576 default behavior is that local storage is not preserved.
577
578#. target-revision=committed | successful-site-action | last-site-action
579 This will indicate which revision of the design will be used as the
580 reference for what should be re-provisioned after the teardown.
581 The default is successful-site-action, which is the closest representation
582 to the last-known-good state.
583
584These should be accepted as parameters to the action API/CLI and modify the
585behavior of the redeploy_server DAG.
586
587Security impact
588---------------
589
590None. This change introduces no new security concerns outside of established
591patterns for RBAC controls around API endpoints.
592
593Performance impact
594------------------
595
596As this is an on-demand action, there is no expected performance impact to
597existing processes, although tearing down a host may result in temporary
598degraded service capacity in the case of needing to move workloads to different
599hosts, or a more simple case of reduced capacity.
600
601Alternatives
602------------
603
604N/A
605
606Implementation
607==============
608
609None at this time
610
611Dependencies
612============
613
614None.
615
616
617References
618==========
619
620None
diff --git a/specs/implemented/deployment-grouping-baremetal.rst b/specs/implemented/deployment-grouping-baremetal.rst
new file mode 100644
index 0000000..10cfa87
--- /dev/null
+++ b/specs/implemented/deployment-grouping-baremetal.rst
@@ -0,0 +1,569 @@
1..
2 Copyright 2018 AT&T Intellectual Property.
3 All Rights Reserved.
4
5 Licensed under the Apache License, Version 2.0 (the "License"); you may
6 not use this file except in compliance with the License. You may obtain
7 a copy of the License at
8
9 http://www.apache.org/licenses/LICENSE-2.0
10
11 Unless required by applicable law or agreed to in writing, software
12 distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
13 WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
14 License for the specific language governing permissions and limitations
15 under the License.
16
17.. index::
18 single: Deployment grouping
19 single: workflow
20 single: Shipyard
21 single: Drydock
22
23.. _deployment-grouping-baremetal:
24
25=======================================
26Deployment Grouping for Baremetal Nodes
27=======================================
28One of the primary functionalities of the Undercloud Platform is the deployment
29of baremetal nodes as part of site deployment and upgrade. This blueprint aims
30to define how deployment strategies can be applied to the workflow during these
31actions.
32
33.. note::
34
35 This document has been moved from the airship-in-a-bottle project, and is
36 previously implemented. The format of this document diverges from the
37 standard template for airship-specs.
38
39Overview
40--------
41When Shipyard is invoked for a deploy_site or update_site action, there are
42three primary stages:
43
441. Preparation and Validation
452. Baremetal and Network Deployment
463. Software Deployment
47
48During the Baremetal and Network Deployment stage, the deploy_site or
49update_site workflow (and perhaps other workflows in the future) invokes
50Drydock to verify the site, prepare the site, prepare the nodes, and deploy the
51nodes. Each of these steps is described in the `Drydock Orchestrator Readme`_
52
53.. _Drydock Orchestrator Readme: https://git.openstack.org/cgit/openstack/airship-drydock/plain/drydock_provisioner/orchestrator/readme.md
54
55The prepare nodes and deploy nodes steps each involve intensive and potentially
56time consuming operations on the target nodes, orchestrated by Drydock and
57MAAS. These steps need to be approached and managed such that grouping,
58ordering, and criticality of success of nodes can be managed in support of
59fault tolerant site deployments and updates.
60
61For the purposes of this document `phase of deployment` refer to the prepare
62nodes and deploy nodes steps of the Baremetal and Network deployment.
63
64Some factors that advise this solution:
65
661. Limits to the amount of parallelization that can occur due to a centralized
67 MAAS system.
682. Faults in the hardware, preventing operational nodes.
693. Miswiring or configuration of network hardware.
704. Incorrect site design causing a mismatch against the hardware.
715. Criticality of particular nodes to the realization of the site design.
726. Desired configurability within the framework of the UCP declarative site
73 design.
747. Improved visibility into the current state of node deployment.
758. A desire to begin the deployment of nodes before the finish of the
76 preparation of nodes -- i.e. start deploying nodes as soon as they are ready
77 to be deployed. Note: This design will not achieve new forms of
78 task parallelization within Drydock; this is recognized as a desired
79 functionality.
80
81Solution
82--------
83Updates supporting this solution will require changes to Shipyard for changed
84workflows and Drydock for the desired node targeting, and for retrieval of
85diagnostic and result information.
86
87.. index::
88 single: Shipyard Documents; DeploymentStrategy
89
90Deployment Strategy Document (Shipyard)
91---------------------------------------
92To accommodate the needed changes, this design introduces a new
93DeploymentStrategy document into the site design to be read and utilized
94by the workflows for update_site and deploy_site.
95
96Groups
97~~~~~~
98Groups are named sets of nodes that will be deployed together. The fields of a
99group are:
100
101name
102 Required. The identifying name of the group.
103
104critical
105 Required. Indicates if this group is required to continue to additional
106 phases of deployment.
107
108depends_on
109 Required, may be empty list. Group names that must be successful before this
110 group can be processed.
111
112selectors
113 Required, may be empty list. A list of identifying information to indicate
114 the nodes that are members of this group.
115
116success_criteria
117 Optional. Criteria that must evaluate to be true before a group is considered
118 successfully complete with a phase of deployment.
119
120Criticality
121'''''''''''
122- Field: critical
123- Valid values: true | false
124
125Each group is required to indicate true or false for the `critical` field.
126This drives the behavior after the deployment of baremetal nodes. If any
127groups that are marked as `critical: true` fail to meet that group's success
128criteria, the workflow should halt after the deployment of baremetal nodes. A
129group that cannot be processed due to a parent dependency failing will be
130considered failed, regardless of the success criteria.
131
132Dependencies
133''''''''''''
134- Field: depends_on
135- Valid values: [] or a list of group names
136
137Each group specifies a list of depends_on groups, or an empty list. All
138identified groups must complete successfully for the phase of deployment before
139the current group is allowed to be processed by the current phase.
140
141- A failure (based on success criteria) of a group prevents any groups
142 dependent upon the failed group from being attempted.
143- Circular dependencies will be rejected as invalid during document validation.
144- There is no guarantee of ordering among groups that have their dependencies
145 met. Any group that is ready for deployment based on declared dependencies
146 will execute. Execution of groups is serialized - two groups will not deploy
147 at the same time.
148
149Selectors
150'''''''''
151- Field: selectors
152- Valid values: [] or a list of selectors
153
154The list of selectors indicate the nodes that will be included in a group.
155Each selector has four available filtering values: node_names, node_tags,
156node_labels, and rack_names. Each selector is an intersection of this
157critera, while the list of selectors is a union of the individual selectors.
158
159- Omitting a criterion from a selector, or using empty list means that criterion
160 is ignored.
161- Having a completely empty list of selectors, or a selector that has no
162 criteria specified indicates ALL nodes.
163- A collection of selectors that results in no nodes being identified will be
164 processed as if 100% of nodes successfully deployed (avoiding division by
165 zero), but would fail the minimum or maximum nodes criteria (still counts as
166 0 nodes)
167- There is no validation against the same node being in multiple groups,
168 however the workflow will not resubmit nodes that have already completed or
169 failed in this deployment to Drydock twice, since it keeps track of each node
170 uniquely. The success or failure of those nodes excluded from submission to
171 Drydock will still be used for the success criteria calculation.
172
173E.g.::
174
175 selectors:
176 - node_names:
177 - node01
178 - node02
179 rack_names:
180 - rack01
181 node_tags:
182 - control
183 - node_names:
184 - node04
185 node_labels:
186 - ucp_control_plane: enabled
187
188Will indicate (not really SQL, just for illustration)::
189
190 SELECT nodes
191 WHERE node_name in ('node01', 'node02')
192 AND rack_name in ('rack01')
193 AND node_tags in ('control')
194 UNION
195 SELECT nodes
196 WHERE node_name in ('node04')
197 AND node_label in ('ucp_control_plane: enabled')
198
199Success Criteria
200''''''''''''''''
201- Field: success_criteria
202- Valid values: for possible values, see below
203
204Each group optionally contains success criteria which is used to indicate if
205the deployment of that group is successful. The values that may be specified:
206
207percent_successful_nodes
208 The calculated success rate of nodes completing the deployment phase.
209
210 E.g.: 75 would mean that 3 of 4 nodes must complete the phase successfully.
211
212 This is useful for groups that have larger numbers of nodes, and do not
213 have critical minimums or are not sensitive to an arbitrary number of nodes
214 not working.
215
216minimum_successful_nodes
217 An integer indicating how many nodes must complete the phase to be considered
218 successful.
219
220maximum_failed_nodes
221 An integer indicating a number of nodes that are allowed to have failed the
222 deployment phase and still consider that group successful.
223
224When no criteria are specified, it means that no checks are done - processing
225continues as if nothing is wrong.
226
227When more than one criterion is specified, each is evaluated separately - if
228any fail, the group is considered failed.
229
230
231Example Deployment Strategy Document
232~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
233This example shows a deployment strategy with 5 groups: control-nodes,
234compute-nodes-1, compute-nodes-2, monitoring-nodes, and ntp-node.
235
236::
237
238 ---
239 schema: shipyard/DeploymentStrategy/v1
240 metadata:
241 schema: metadata/Document/v1
242 name: deployment-strategy
243 layeringDefinition:
244 abstract: false
245 layer: global
246 storagePolicy: cleartext
247 data:
248 groups:
249 - name: control-nodes
250 critical: true
251 depends_on:
252 - ntp-node
253 selectors:
254 - node_names: []
255 node_labels: []
256 node_tags:
257 - control
258 rack_names:
259 - rack03
260 success_criteria:
261 percent_successful_nodes: 90
262 minimum_successful_nodes: 3
263 maximum_failed_nodes: 1
264 - name: compute-nodes-1
265 critical: false
266 depends_on:
267 - control-nodes
268 selectors:
269 - node_names: []
270 node_labels: []
271 rack_names:
272 - rack01
273 node_tags:
274 - compute
275 success_criteria:
276 percent_successful_nodes: 50
277 - name: compute-nodes-2
278 critical: false
279 depends_on:
280 - control-nodes
281 selectors:
282 - node_names: []
283 node_labels: []
284 rack_names:
285 - rack02
286 node_tags:
287 - compute
288 success_criteria:
289 percent_successful_nodes: 50
290 - name: monitoring-nodes
291 critical: false
292 depends_on: []
293 selectors:
294 - node_names: []
295 node_labels: []
296 node_tags:
297 - monitoring
298 rack_names:
299 - rack03
300 - rack02
301 - rack01
302 - name: ntp-node
303 critical: true
304 depends_on: []
305 selectors:
306 - node_names:
307 - ntp01
308 node_labels: []
309 node_tags: []
310 rack_names: []
311 success_criteria:
312 minimum_successful_nodes: 1
313
314The ordering of groups, as defined by the dependencies (``depends-on``
315fields)::
316
317 __________ __________________
318 | ntp-node | | monitoring-nodes |
319 ---------- ------------------
320 |
321 ____V__________
322 | control-nodes |
323 ---------------
324 |_________________________
325 | |
326 ______V__________ ______V__________
327 | compute-nodes-1 | | compute-nodes-2 |
328 ----------------- -----------------
329
330Given this, the order of execution could be:
331
332- ntp-node > monitoring-nodes > control-nodes > compute-nodes-1 > compute-nodes-2
333- ntp-node > control-nodes > compute-nodes-2 > compute-nodes-1 > monitoring-nodes
334- monitoring-nodes > ntp-node > control-nodes > compute-nodes-1 > compute-nodes-2
335- and many more ... the only guarantee is that ntp-node will run some time
336 before control-nodes, which will run sometime before both of the
337 compute-nodes. Monitoring-nodes can run at any time.
338
339Also of note are the various combinations of selectors and the varied use of
340success criteria.
341
342Deployment Configuration Document (Shipyard)
343--------------------------------------------
344The existing deployment-configuration document that is used by the workflows
345will also be modified to use the existing deployment_strategy field to provide
346the name of the deployment-straegy document that will be used.
347
348The default value for the name of the DeploymentStrategy document will be
349``deployment-strategy``.
350
351Drydock Changes
352---------------
353
354API and CLI
355~~~~~~~~~~~
356- A new API needs to be provided that accepts a node filter (i.e. selector,
357 above) and returns a list of node names that result from analysis of the
358 design. Input to this API will also need to include a design reference.
359
360- Drydock needs to provide a "tree" output of tasks rooted at the requested
361 parent task. This will provide the needed success/failure status for nodes
362 that have been prepared/deployed.
363
364Documentation
365~~~~~~~~~~~~~
366Drydock documentation will be updated to match the introduction of new APIs
367
368
369Shipyard Changes
370----------------
371
372API and CLI
373~~~~~~~~~~~
374- The commit configdocs api will need to be enhanced to look up the
375 DeploymentStrategy by using the DeploymentConfiguration.
376- The DeploymentStrategy document will need to be validated to ensure there are
377 no circular dependencies in the groups' declared dependencies (perhaps
378 NetworkX_).
379- A new API endpoint (and matching CLI) is desired to retrieve the status of
380 nodes as known to Drydock/MAAS and their MAAS status. The existing node list
381 API in Drydock provides a JSON output that can be utilized for this purpose.
382
383Workflow
384~~~~~~~~
385The deploy_site and update_site workflows will be modified to utilize the
386DeploymentStrategy.
387
388- The deployment configuration step will be enhanced to also read the
389 deployment strategy and pass the information on a new xcom for use by the
390 baremetal nodes step (see below)
391- The prepare nodes and deploy nodes steps will be combined to perform both as
392 part of the resolution of an overall ``baremetal nodes`` step.
393 The baremetal nodes step will introduce functionality that reads in the
394 deployment strategy (from the prior xcom), and can orchestrate the calls to
395 Drydock to enact the grouping, ordering and and success evaluation.
396 Note that Drydock will serialize tasks; there is no parallelization of
397 prepare/deploy at this time.
398
399Needed Functionality
400''''''''''''''''''''
401
402- function to formulate the ordered groups based on dependencies (perhaps
403 NetworkX_)
404- function to evaluate success/failure against the success criteria for a group
405 based on the result list of succeeded or failed nodes.
406- function to mark groups as success or failure (including failed due to
407 dependency failure), as well as keep track of the (if any) successful and
408 failed nodes.
409- function to get a group that is ready to execute, or 'Done' when all groups
410 are either complete or failed.
411- function to formulate the node filter for Drydock based on a group's
412 selectors
413- function to orchestrate processing groups, moving to the next group (or being
414 done) when a prior group completes or fails.
415- function to summarize the success/failed nodes for a group (primarily for
416 reporting to the logs at this time).
417
418Process
419'''''''
420The baremetal nodes step (preparation and deployment of nodes) will proceed as
421follows:
422
4231. Each group's selector will be sent to Drydock to determine the list of
424 nodes that are a part of that group.
425
426 - An overall status will be kept for each unique node (not started |
427 prepared | success | failure).
428 - When sending a task to Drydock for processing, the nodes associated with
429 that group will be sent as a simple `node_name` node filter. This will
430 allow for this list to exclude nodes that have a status that is not
431 congruent for the task being performed.
432
433 - prepare nodes valid status: not started
434 - deploy nodes valid status: prepared
435
4362. In a processing loop, groups that are ready to be processed based on their
437 dependencies (and the success criteria of groups they are dependent upon)
438 will be selected for processing until there are no more groups that can be
439 processed. The processing will consist of preparing and then deploying the
440 group.
441
442 - The selected group will be prepared and then deployed before selecting
443 another group for processing.
444 - Any nodes that failed as part of that group will be excluded from
445 subsequent deployment or preparation of that node for this deployment.
446
447 - Excluding nodes that are already processed addresses groups that have
448 overlapping lists of nodes due to the group's selectors, and prevents
449 sending them to Drydock for re-processing.
450 - Evaluation of the success criteria will use the full set of nodes
451 identified by the selector. This means that if a node was previously
452 successfully deployed, that same node will count as "successful" when
453 evaluating the success criteria.
454
455 - The success criteria will be evaluated after the group's prepare step and
456 the deploy step. A failure to meet the success criteria in a prepare step
457 will cause the deploy step for that group to be skipped (and marked as
458 failed).
459 - Any nodes that fail during the prepare step, will not be used in the
460 corresponding deploy step.
461 - Upon completion (success, partial success, or failure) of a prepare step,
462 the nodes that were sent for preparation will be marked in the unique list
463 of nodes (above) with their appropriate status: prepared or failure
464 - Upon completion of a group's deployment step, the nodes status will be
465 updated to their current status: success or failure.
466
4674. Before the end of the baremetal nodes step, following all eligible group
468 processing, a report will be logged to indicate the success/failure of
469 groups and the status of the individual nodes. Note that it is possible for
470 individual nodes to be left in `not started` state if they were only part of
471 groups that were never allowed to process due to dependencies and success
472 criteria.
473
4745. At the end of the baremetal nodes step, if any nodes that have failed
475 due to timeout, dependency failure, or success criteria failure and are
476 marked as critical will trigger an Airflow Exception, resulting in a failed
477 deployment.
478
479Notes:
480
481- The timeout values specified for the prepare nodes and deploy nodes steps
482 will be used to put bounds on the individual calls to Drydock. A failure
483 based on these values will be treated as a failure for the group; we need to
484 be vigilant on if this will lead to indeterminate states for nodes that mess
485 with further processing. (e.g. Timed out, but the requested work still
486 continued to completion)
487
488Example Processing
489''''''''''''''''''
490Using the defined deployment strategy in the above example, the following is
491an example of how it may process::
492
493 Start
494 |
495 | prepare ntp-node <SUCCESS>
496 | deploy ntp-node <SUCCESS>
497 V
498 | prepare control-nodes <SUCCESS>
499 | deploy control-nodes <SUCCESS>
500 V
501 | prepare monitoring-nodes <SUCCESS>
502 | deploy monitoring-nodes <SUCCESS>
503 V
504 | prepare compute-nodes-2 <SUCCESS>
505 | deploy compute-nodes-2 <SUCCESS>
506 V
507 | prepare compute-nodes-1 <SUCCESS>
508 | deploy compute-nodes-1 <SUCCESS>
509 |
510 Finish (success)
511
512If there were a failure in preparing the ntp-node, the following would be the
513result::
514
515 Start
516 |
517 | prepare ntp-node <FAILED>
518 | deploy ntp-node <FAILED, due to prepare failure>
519 V
520 | prepare control-nodes <FAILED, due to dependency>
521 | deploy control-nodes <FAILED, due to dependency>
522 V
523 | prepare monitoring-nodes <SUCCESS>
524 | deploy monitoring-nodes <SUCCESS>
525 V
526 | prepare compute-nodes-2 <FAILED, due to dependency>
527 | deploy compute-nodes-2 <FAILED, due to dependency>
528 V
529 | prepare compute-nodes-1 <FAILED, due to dependency>
530 | deploy compute-nodes-1 <FAILED, due to dependency>
531 |
532 Finish (failed due to critical group failed)
533
534If a failure occurred during the deploy of compute-nodes-2, the following would
535result::
536
537 Start
538 |
539 | prepare ntp-node <SUCCESS>
540 | deploy ntp-node <SUCCESS>
541 V
542 | prepare control-nodes <SUCCESS>
543 | deploy control-nodes <SUCCESS>
544 V
545 | prepare monitoring-nodes <SUCCESS>
546 | deploy monitoring-nodes <SUCCESS>
547 V
548 | prepare compute-nodes-2 <SUCCESS>
549 | deploy compute-nodes-2 <FAILED>
550 V
551 | prepare compute-nodes-1 <SUCCESS>
552 | deploy compute-nodes-1 <SUCCESS>
553 |
554 Finish (success with some nodes/groups failed)
555
556Schemas
557~~~~~~~
558A new schema will need to be provided by Shipyard to validate the
559DeploymentStrategy document.
560
561Documentation
562~~~~~~~~~~~~~
563The Shipyard action documentation will need to include details defining the
564DeploymentStrategy document (mostly as defined here), as well as the update to
565the DeploymentConfiguration document to contain the name of the
566DeploymentStrategy document.
567
568
569.. _NetworkX: https://networkx.github.io/documentation/networkx-1.9/reference/generated/networkx.algorithms.dag.topological_sort.html