Commit Graph

19 Commits

Author SHA1 Message Date
Sergiy Markin 9c28c832dd Shipyard timeout issue
This PS  adds default values for chart values and resolves some issues
in python code that utilizes these values:

      validation_connect_timeout: 20
      validation_read_timeout: 300
      deckhand_client_connect_timeout: 20
      deckhand_client_read_timeout: 300
      drydock_client_connect_timeout: 20
      drydock_client_read_timeout: 300

Change-Id: Ic5b1920257859239613a3ce77134e6b05bd7e9dd
2023-05-16 20:37:29 +00:00
Sergiy Markin 154a099b28 Shipyard upgrade for focal
- upgraded Airflow to 1.10.15  -
  https://airflow.apache.org/docs/apache-airflow/1.10.15/changelog.html
- disabled xenial, bionic and opensuse images build  gates
- added focal image build  gate
- added focal zuul build node
- adjusted Makefile for focal
- added bindep.txt to utilize bindep zuul base role for zuul build node
  pre-setup
- added focal Dockerfile
- implemented freeze requirements.txt approach like in other Airship
  projects
- removed specific requirements.txt for airflow in favor of using
  requirements-frozen.txt from shipyard_airflow project when building
  airflow docker image
- fixed docker image publishing to Quay
- replaces deprecated LOG.warn with new LOG.warning call
- replaced deprecated body attribute in responce wiht responce.text
  attribute
- update of falcon module deprecated .API call - replaced wiht
  falcon.App call
- deprecated routing.create_http_method_map method replaced with
  routing.map_http_methods
- re-formatted code tabulations based on yapf recommendations
- replaced deprecated protocol attribute in Pytest create_environ() with
  http_version attribute
- replaced deprecated app attribute in Pytest create_environ() with
  root_path attribute
- fixed airflow CLI commands to match 1.10.15 version
- updated zuul gates to work on focal nodes and added focal specific
  node setup items by adding appriate ansible tasks and roles
- uplifted Helm to 3.9.4
- uplifted stable HTK commit id
- updated tox.in to with with tox v4
- uplifted dependences references to other Airship projects
- common python dependences were syncronized with other Airship
  projects(Promenade, Deckhand, Armada, Drydock)
- fixed airskiff deployment gate
- fixed genconfig* profiles in shipyard-airflow tox.ini responsible for
  maintanance of policy.yaml.sample and shipyard.conf.sample

Change-Id: I0c85187dc9bacf0849382563dd5ff7e9b2814c59
2023-04-28 20:40:50 +00:00
Smruti Soumitra Khuntia 9c5270b616 User context tracing through logging
This PS adds entry in log for user id and passes on the context
maker to other Airship components from Shipyard during API call.

This will ensure easy tracing of user and context through log
tracing.

Change-Id: Ib9bfa8f20b641f8bb6c2dca967d9388e30d5735c
2019-04-04 13:19:02 +00:00
Bryan Strassner 84d99967bc Add configurable timeout for Drydock client
Adds configs to allow a drydock client to use a non-default read
timeout.

Change-Id: Id4e4a235861165bfb5eb571684c8ce0be4181543
2018-11-13 13:10:07 -06:00
Bryan Strassner f5774206e5 Add notes support for Builddata output
Enhances the workflow to include adding notes that contain the builddata
information associated with the Drydock steps. Part of adding this
support includes adding general notes support to all of the operators
that inherit from the UcpBaseOperator

Storyboard References:
Story: 2002797
Story: 2002796

Change-Id: I5e1a54d6373c4a523e2d4fe87796da4358f22055
2018-10-16 07:45:57 -05:00
Zuul 7b040ec266 Merge "Ensure pod logs are fetched in case of exception in any operator" 2018-09-26 21:58:42 +00:00
Roman Gorshunov 7fa3136470 Fix: various documentation and URL fixes
1) UCP -> Airship
2) readthedocs.org -> readthedocs.io (there is redirect)
3) http -> https
4) attcomdev -> airshipit (repo on quay.io)
5) att-comdev -> openstack/airship-* (repo on github/openstack git)
6) many URLs have been verified and adjusted to be current
7) no need for 'en/latest/' path in URL of the RTD
8) added more info to some setup.cfg and setup.py files
9) ucp-integration docs are now in airship-in-a-bottle
10) various other minor fixes

Change-Id: I4b8cc6ddf491e35d600a83f5f82d7717108e31dd
2018-09-24 12:53:27 +02:00
Andrey Volkov 4164518502 Ensure pod logs are fetched in case of exception in any operator
This patch tries to cover some edge cases could happen during Shipyard
Airflow operator execution. All operators at the moment make
interactions with other services i.e. k8s pods. In a case of exceptions
during execution of the operator, logs will be fetched from the
appropriate pod and if the operator has "fetch_failure_details" method
(see DrydockBaseOperator) it will be called as well.

What exception could happen during an operator execution?
Besides explicitly defined in code like
DrydockClientUseFailureException, other exception e.g. KeyError or
similar may be raised. It's not clear who is a culprit in that client
side (Shipyard) or server side (Drydock, Armada, Deckhand,
Promenade). So this patch applies defensive mode and gets logs from
pods and gets additional details for any exceptional situations.

For doing that do_execute method is wrapped with try..except
in UcpBaseOperator.execute. While fetching logs from a pod
and fetching failure details it makes appropriate logging by itself
and finally reraises the original exception.

Change-Id: If1501e9a24b05edb6eb32c7b1b2d27f24f3ee063
2018-09-19 09:17:13 -07:00
hosingh000 c1bd1203c7 Block site_update if there is no host in MaaS/Drydock
Added the feature in airflow to verify that MaaS list
of BM hosts is not empty for shipyard update_site action.
If the MaaS Machine list is empty, and the
continue-on-fail parameter is not set to true (the default
value is false), it will fail the shipyard steps to
parepare and re-deploy the missing nodes in MaaS through
DD.
Caveat: this US did not have the requirement to compare
the list of nodes in MaaS with the expected site Design.
It simply checks for empty node list, and decide based
on that.

Change-Id: I5ba4a107fe2ae43728e5941570b6c88a436d7b12
2018-09-12 14:13:28 -05:00
Bryan Strassner be81162168 Only attempt deploying nodes that were prepared
When processing a deployment group, the the deployment of nodes was
using the same input and a success against the success_criteria
evaluated after preparing nodes. This lead to situations where nodes
failed to prepare, but were assumed (and thusly failed) for deployment.
This was especially problematic when a timeout was triggered by Shipyard
before Drydock had finished preparing.

This change will only attempt to deploy nodes that were positively
identified as prepared by Drydock. When the timeout scenario is reached,
since there will have been no positive confirmation of successful nodes,
the deployment of nodes will not be attempted. This will also prevent
attempting to deploy nodes that have expicitly failed to prepare.

Additionally, added some TODOs around the concept of cancelling tasks in
Drydock when Shipyard stops due to a timeout, however, this kind of
functionality does not yet exist, so the TODOs serve as a placeholder.

Change-Id: I582abcec62407dc2903d8a4477ea891a9397f1fb
2018-09-05 15:51:31 +00:00
Bryan Strassner f3749ca3f9 Add redeploy_server processing
Adds the functionality to redeploy a server in an unguarded fashion,
meaning that the server will not be pre-validated to be in a state that
workloads have been removed.

This is the first targeted action for Shipyard, so a refactoring of the
validators to support more flexibility has been done.

Also adds RBAC controls for specific actions being created, rather than
a binary create action privilege.

Change-Id: I39e3dab6595c5187affb9f2b6edadd0f5f925b0c
2018-08-21 09:42:40 -05:00
Bryan Strassner 038f958501 Refactor imports to support loading dags for tests
Updates the imports for the dags and operators to support both "as
deployed" and "as tested" package configurations. This allows for a
simple test to be added that at least imorts and checks the dags to
ensure they contain steps.

A future refactor may eliminate the need for some/much of this by moving the
operators away from the plugin appraoch such that they can be statically
built into the airflow pod and used like a third party library instead
of being appended to the airflow plugins. For now though, this maintains
the status quo for the way these are used in a deployed way.

Change-Id: I437ff9c583358188e27de0e2f6987c38ca85ab2f
2018-07-25 09:19:18 -05:00
Bryan Strassner e5a0bf4a32 [trivial] fix several minor issues
This cleanup fixes a handful of items identified as not
blocking a larger patchset.

Change-Id: I4642112c444c546c0bb271ed58a2d9b76155cad7
2018-06-22 13:05:32 -05:00
Bryan Strassner 04906cce68 Workflow to support deployment groups
Updates the Shipyard/Airflow workflow for deploy_site and
update_site to use the deployment group/deployment strategy
information from the design.

This allows for baremetal nodes to be deployed in a design-
specified order, with criticality and success criteria driving
the success and failure of deployment.

Includes refactoring of service endpoints to reduce the need
for so much data passing.

Change-Id: Ib5e9fca535ca74d1819fe46959695acfed5b65c2
2018-06-20 09:55:15 -05:00
Mark Burnett 45e05a31f0 Initialize variables in drydock operator
Change-Id: I428c024193526489d928ff8172f2f90b15099596
2018-06-04 14:42:11 -05:00
Anthony Lin 952d6d6fcd [Fix] Failed to Retrieve Details of Drydock Failed Tasks
Fix the issue where Shipyard/Airflow fails to handle drilldown
of Drydock task failures (logs dump) during a failed site action.

1) Updates condition to use errorCount instead
2) Drill down to all layers instead of the current 1 layer
3) Use pprint instead of json.dumps

Change-Id: Ifcc964e04c3216f11a2a94c40d8681d76fd68581
2018-05-11 14:14:41 +00:00
Krysta 3629245b0c Pass Drydock health failure
Bypass a failure due to the health of Drydock on update_site
or deploy_site if the continue-on-fail param is enetered.

Adds unit tests

Changed operator to be more easily testable

Removed the K8s Health Check for preflight checks. We will add
the functionalities back once we have a clearer view on what
should be checked and validated for the workflow.

Change-Id: Idd6d6b18d762a0284f2041248faa4040c78def3f
2018-05-02 15:47:06 +00:00
Anthony Lin dee8887c88 Ensure Presence of Committed Doc prior to Workflow Execution
Shipyard should (1) validate that there is a current committed version
of the documents, (2) Pass that committed version as a parameter to the
workflow, that is then used for the entire workflow.

This applies to all the current workflow, i.e. deploy_site, update_site,
redeploy_server

Note that We will remove the step to retrieve deckhand design version from
the workflow as it will be handled as part of the Shipyard Create Action
with this change.

Change-Id: Ifdbdd8f1ce1b2c6afa26fdfaee86cbb2776ca715
2018-04-25 13:55:26 +00:00
Bryan Strassner 769d0ded47 Refactor shipyard to UCP target layout
Refactor Shipyard to be better able to leverage common
packages and conform with the target UCP standard layout.

This change supports the same tox entrypoints at
the root level, but the preferred approach is to use make
targets defined in the Makefile such as 'make tests' and
'make lint'

The previous tox.ini has moved and been
tailored to the specifics of each subproject at
src/bin/*/tox.ini

Autotmatic generation of the policy and configuration
files has been removed from the sphinx build for now
but these files will be automatically generated locally
into the docs source by using a 'make docs' command.
This may need to be revisited later to re-enable the
automatic generation of these files such that readthedocs
would still support the project layout.

Change-Id: Ifdc1cd4cf35fb3c5923414c677b781a60a9bae42
2018-04-24 16:47:13 -05:00