This PS updates python modules and code to match Airflow 2.6.2:
- bionic py36 gates were removed
- python code corrected to match new modules versions
- selection of python modules versions was perfoemed based on
airflow-2.6.2 constraints
- airskiff deploy pipeline was aligned with latest in treasuremap v1.9
Change-Id: If6f57325339995216d2553c7a5ff56e7673b5acc
- armada-airskiff-deploy is voting gate again
- fixed falcon.API deprecation - -> falcon.App
- fixed collections.abc.defaultdict not found error
- fixed tox4 requirements
- implemented requirements-frozen.txt approach to make allike as other
Airship projects
- uplifted docker version in the image building and publishing gate
Change-Id: I337ec07cd6d082acabd9ad65dd9eefb728a43b12
Update kubernetes client to v26.1.0
Updating armada to focal base image
Remove xenial and opensuse dockerfiles
Update tox python from py35 to py38
Add apparmor for docker build
Uplift HTK chart version 0.2.52
Bumping up some python dependencies to get in sync with shipyard
Added clear-firewall role for airskiff-deploy playbook
Change-Id: If06a3f60466702d05a21c24a7cb8041bed41507a
For now we leave the tiller status enpdpoint, until
Shipyard has had a release to stop depending on it [0].
[0]: https://review.opendev.org/c/airship/shipyard/+/802718
Signed-off-by: Sean Eagan <seaneagan1@gmail.com>
Change-Id: If8a02d7118f6840fdbbe088b4086aee9a18ababb
Helm 3 breaking changes (likely non-exhaustive):
- crd-install hook removed and replaced with crds directory in
chart where all CRDs defined in it will be installed before
any rendering of the chart
- test-failure hook annotation value removed, and test-success
deprecated. Use test instead
- `--force` no longer handles recreating resources which
cannot be updated due to e.g. immutability [0]
- `--recreate-pods` removed, use declarative approach instead [1]
[0]: https://github.com/helm/helm/issues/7082
[1]: https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
Signed-off-by: Sean Eagan <seaneagan1@gmail.com>
Change-Id: I20ff40ba55197de3d37e5fd647e7d2524a53248f
This removes release rollback/delete functionality. This functionality
was likely not being used and thus was likely not working.
This primary driver for this change is to ease introduction of Helm 3
support. Particularly to avoid having to make API changes related to
the namespacing of helm releases in Helm 3.
This also removes the swagger api documentation as it was not
maintained.
Change-Id: I7edb1c449d43690c87e5bb24726a9fcaf428c00b
This is a pre-requisite for Helm 3 integration, so that these
actions run regardless of whether we are going through the
tiller handler.
Change-Id: I97d7bcc823d11b527fcdaa7967fcab62af1c8161
This reverts commit c75898cd6a.
Airship 2 ended up using the Flux helm-controller instead:
https://github.com/fluxcd/helm-controller
So this is no longer needed. Removing it to get rid of tech
debt to ease introduction of Helm 3 support.
This retains the part of the commit which extracts the
chart download logic to its own handler as this is still useful.
Change-Id: Icb468be2d4916620fd78df250fd038ab58840182
In the occasion of a pod being evicted due to low resource availability,
armada keeps waiting for the Evicted pod to be ready. This commit
removes that behavior since Kubernetes will spun up a new pod.
Story: 2008645
Task: 41906
Signed-off-by: Thiago Brito <thiago.brito@windriver.com>
Change-Id: I7263eebe357b0952375d538555536dc9f7cceff4
Airship 2 is using Argo for workflow management, rather
than the builtin Armada workflow functionality. Hence, this
adds an apply_chart CLI command to apply a single chart at
a time, so that Argo can manage the higher level orchestration.
Airship 2 is also using kubernetes as opposed to Deckhand as the
document store. Hence this adds an ArmadaChart kubernetes CRD,
which can be consumed by the apply_chart CLI command. The chart
`dependencies` feature is intentionally not supported by the CRD,
as there are additional complexities to make that work, and ideally
this feature should be deprecated as charts should be building in
there dependencies before consumption by Armada.
Functional tests are included to excercise these features
against a minikube cluster.
Change-Id: I2bbed83d6d80091322a7e60b918a534188467239
It doesn't appear to be compatible with newer versions of python
and the mock library, and wasn't working correctly anyways.
Change-Id: I117d01bed40849587b2d0337aad56fccdf77e192
Armada has previously named template files relative to the
`templates` dir, whereas the Helm CLI names them relative
to the chart root. This causes `include`s of these templates
to fail.
This change fixes this, for armada/Chart/v2 docs only, since it
is a breaking change, as some charts may have already aligned
with the existing Armada behavior. When updating a release
previously deployed with armada/Chart/v1, the fixed template
names alone will not cause the release to be updated, as the
diff logic accounts for this.
Change-Id: I243073ca4c2e1edbcb0d8f649475f568fc7c818f
Armada's dry-run option is incomplete, no longer maintained, and offers
little value for the complexity required to maintain it.
This commit is the final in a series of changes to remove the dry-run
feature. Specifically, this change removes the functionality associated
with the dry-run feature.
Story: 2005121
Change-Id: I7dfe5ab27511debe2b8ac01f8e0a696c6126a9f7
Signed-off-by: Drew Walters <andrew.walters@att.com>
Armada's dry-run option is incomplete, no longer maintained, and offers
little value for the complexity required to maintain it.
This commit is the first in a series of changes to remove the dry-run
feature. Specifically, this change removes the parameter as an option
for the API.
Story: 2005121
Change-Id: If5bd2639fe3e9af3f4cc669cd627b47c1d8fec16
Signed-off-by: Drew Walters <andrew.walters@att.com>
This implements Prometheus metric integration, including metric
definition, collection, and exportation.
End user documentation for supported metric data and exportation
interface is included.
Change-Id: Ia0837f28073d6cd8e0220ac84cdd261b32704ae4
From recently merged document updates in [0] there is a desire to
standardize the Airship project python codebase. This is the effort
to do so for the Armada project.
[0] https://review.opendev.org/#/c/671291/
Change-Id: I4fe916d6e330618ea3a1fccfa4bdfdfabb9ffcb2
This introduces v2 docs in order to allow users to opt in to
breaking changes, while still supporting v1 docs for a time
so folks can migrate. At some point v1 doc support will be
removed.
This initial version of v2 docs is experimental. Further
breaking changes will be made before v2 docs are finalized.
A v1-v2 migration guide is included in the documentation.
This also refactors the internal data model to include the full
document structure, such as `metadata` and `schema`, so that
different behavior can be acheived for v1, v2, etc.
Change-Id: Ia0d44ff4276ef4c27f78706ab02c88aa421a307f
This change creates a Tiller sidecar in the Armada chart and
configures Armada to use this Tiller by default for its operations.
This allows Armada to communicate with this Tiller without exposing it
to the rest of the cluster.
This also removes `tiller_host` and `tiller_port` as API parameters as
they should now just be configured using the configuration file. When
the Tiller sidecar is enabled, configurations will be overridden to
point to it. Otherwise Armada will rely on the Tiller pod lookup.
While this will later enable the Tiller charts to be removed, they
will not be in this change as there is currently no alternative in
Airship to communicate with the cluster using Helm.
Co-Authored-By: Michael Beaver <michaelbeaver64@gmail.com>
Change-Id: Id881e379be580efd60bae400fa402ce238bfd6ef
This creates a new mechanism in Armada to enable functions to only be
run once across multiple instances of Armada working with the same
Kubernetes cluster. This is accomplished by utilizing custom resources
via the Kubernetes API.
This also introduces new config defaults that can be used to configure
the lock timeout, expiration, and update interval.
Some notes on how the lock works:
* Functions to be locked can add the new decorator
* The optional name parameter can be used to create multiple
types of locks which can coexist
* If the lock is unable to be acquired before the timeout a new
exception is raised
* The lock is updated regularly while the decorated function is
still running
* If a lock already exists it will only be overwritten if the
duration since its last update is longer than the expiration time
For now this locking method is being used for components that require
write access to Tiller so that simultaneous write operations are
avoided.
Change-Id: Iee07da9a233ee2e2a54c6bc4881185388b377c05
- Zuul updated ansible to 2.7, no longer uses missing variables.
- Using an if to try and address.
- Fixes a few formatting problems that are causing the gates to fail
Docker fix based on Aaron Sheffield's PS for Pegleg:
https://review.openstack.org/#/c/645631/
Change-Id: I14e8f3aac0af7a3abc4e2b6c4ece292a24bc4c6a
This excludes the following generated objects from wait logic:
1. cronjob-generated jobs: these are not directly part of the release,
so better not to wait on them. if there is a desire to wait on initial
cronjob success, we can add a separate "type: cronjob" wait for this
for that purpose.
2. job-generated pods: for the purposes of waiting on jobs, one should
ensure their configuration includes a "type: job" wait. Once
controller-based waits are included by default we can also consider
excluding controller-owned pods from the "type: pod" wait, as those
will be handled by the controller-based waits then.
Change-Id: Ibf56c6fef9ef72b62da0b066c92c5f29ee4ecb5f
Currently, tests executed during chart deployment use the wait timeout
value, `wait.timeout`. This value can be too large of a timeout value
for Helm tests. This change introduces a timeout for tests,
`test.timeout` that is only used as a timeout for running Helm tests for
a release.
Story: 2003899
Depends-On: https://review.openstack.org/618355
Change-Id: Iee746444d5aede0b84b1805eb19f59f0f03c8f9e
Previously the timeout for deleting chart releases was 300s and
not configurable, this patchset makes it so via a new
`delete.timeout` property in the `armada/Chart/v1` schema.
Helm releases deleted which do not correspond to documents in this
schema still do not use a configurable timeout. Those will be
considered separately.
This also includes a minor logging fix.
Change-Id: Ia588faaafd18a3ac00eed3cda2f0556ffcec82c9
When running helm tests for a chart release multiple times in a site,
if the previous test pod is not deleted, then the test pod creation
can fail due to a name conflict. Armada/helm support immediate test pod
cleanup, but using this means that upon test failure, the test pod logs will
not be available for debugging purposes. Due to this, the recommended approach
for deleting test pods in Armada has been using `upgrade.pre.delete` actions.
So chart authors can accomplish test pod deletion using this
feature, however, it often takes awhile, usually not until they test upgrading
the chart for chart authors to realize that this is necessary and to get it
implemented.
This patchset automates deletion of test pods directly before running tests by
using the `wait.labels` field in the chart doc when they exist to find all pods
in the release and then using their annotations to determine if they are test
pods and deleting them if so.
A later patchset is planned to implement defaulting of the wait labels when
they are not defined.
Change-Id: I2092f448acb88b5ade3b31b397f9c874c0061668
Fixes a bug where Armada Was looking for upgrade options
(force, recreate_pods currently) underneath `upgrade` directly
rather than `upgrade.options` where they are defined in the schema.
Change-Id: Ia95129a19c87f5d59eaefccd04a7ac9e2acb0b3b
While authoring [0], it was discovered that Armada has duplicate logic
for deciding if Helm test cleanup should be enabled as well as the tests
themselves. Because of this, changes to test logic (e.g. adding pre-test
actions), requires changing all traces of the repeated logic, which can
lead to inconsistent behavior if not properly addressed. This change
moves all test decision logic to a singular Test handler, implemented by
the `Test` class. This change does NOT change the expected behavior of
testing during upgrades; however, tests initiated from the API and CLI
will not execute when testing a manifest if they are disabled in a
chart, unless using the `--enable-all` flag.
[0] https://review.openstack.org/617834
Change-Id: I1530d7637b0eb6a83f048895053a5db80d033046
We have seen issues with dangling threads in Armada. This is likely due to
a bug [0] in the version of gRPC that we were pinned to.
This patchset:
- moves us to the latest versions of the gRPC python libraries which add
a new `channel.close()` method to cleanup channels.
- implements the python context manager api in the tiller handler
- uses the context manager api to explicitly scope tiller channel creation
and cleanup to each Armada API and CLI call.
This also fixes a couples issues with error handling introduced in [1].
[0]: https://github.com/grpc/grpc/issues/14338
[1]: https://review.openstack.org/#/c/610384
Change-Id: I2577a20fc76c397aa33157dc12a0e1d36f49733e
When waiting on resources that share labels with existing test pods,
an upgrade can fail due to a wait operation on the existing test pods.
This change skips wait operations on test resources by filtering them
using Helm hooks.
Change-Id: I465d3429216457ea8d088064cafa74b2b0d9b8cb
Previously if a chart is not updated, it would simply be skipped over.
Now, the wait/tests are run in this case to ensure the chart success
criteria is/was actually satisfied. It does still skip tests if there
is a last test result recorded as successful already, as an
optimization.
Change-Id: I5dc95fe0f16fe0989761e771c77d2c4fa8f6e7ea
Caching and cleanup of git repository chart sources was previously
implemented. This adds these features for tarball sources as well.
This also implements transitive chart dependency sourcing. Previously
only a single level of dependencies were being downloaded, which
would lead to an error when multiple dependency levels exist.
Change-Id: I988e473a6ea29331e036d26c3ec7269374e0188f
Change I68efbde4e9dd2e6e9455d91313eb45c9c79d35ce added a noqa to silence
flake 3.6.0. A better way is to fix the line completely and use a raw
string for the regex.
Change-Id: Iaa7486ee11fdf6d97597c6c3bc6403677d499429
Flake8 3.6.0 now warns about both line break after and *before* binary
operator, you have to choose whether you use W503 or W504. Disable the
newer W504.
Fix "F841 local variable 'e' is assigned to but never used".
Handle warnings about invalid escape sequence in regex.
Handle invalid escape sequence in string.
Change-Id: I68efbde4e9dd2e6e9455d91313eb45c9c79d35ce
This fixes the following issues with listing releases from tiller,
which could cause Armada to be confused about the state of the
latest release, and do the wrong thing.
- Was not filtering out old releases, so we could find both a
FAILED and DEPLOYED release for the same chart. When this is the
case it likely means the FAILED release is the latest, since
otherwise armada would have purged the release (and all its
history) upon seeing the FAILED release in a previous run.
The issue is that after the purge it would try to upgrade
rather than re-install, since it also sees the old DEPLOYED
release. Also if a release gets manually fixed (DEPLOYED)
outside of armada, armada still sees the old FAILED release,
and will purge the fixed release.
- Was only fetching DEPLOYED and FAILED releases from tiller, so if
the latest release has another status Armada won't see it at all.
This changes to:
- Fetch releases with all statuses.
- Filter out old releases.
- Raise an error if latest release has status other than DEPLOYED
or FAILED, since it's not clear what other action to take in
this scenario.
Change-Id: I84712c1486c19d2bba302bf3420df916265ba70c
The tiller list releases command has a bug when using sorting
and paging simultaneously. Armada was passing sorting parameters,
but it doesn't really care about the order, so this removes the
sorting parameters to avoid the tiller issue.
Change-Id: If8349a8093d4b79d5e056d988b710372705eb669
This patch set is trivial: Drops armada/tests/unit/cli.py
because it is empty. This way it is apparent that Armada
needs CLI unit tests. When they are added this folder
can be re-recreated.
Change-Id: Ice9669aa5b21191d4de646b9035a135a2722a2f9
`oslo.policy` supports both enforce and authorize. authorize is
stricter because it'll raise an exception if the policy action is
not found in the list of registered rules. This means that attempting
to enforce anything not found in ``armada.common.policies`` will error
out with a 'Policy not registered' message and 403 status code.
This problem manifests itself through such cases: [0]
Please reference the oslo.policy docs on authorize [1] and
enforce [2] to better understand the discrepancy between the
two.
[0] https://review.openstack.org/#/c/610117/1
[1] feac3dcbfe/oslo_policy/policy.py (L960)
[2] feac3dcbfe/oslo_policy/policy.py (L792)
Change-Id: I5b0a28a2b5fb4dff150f13a56013a7a9b694c756
Tiller has a non-configurable gRPC max response message size. If the
list releases response reaches this size it silently truncates the
results to be below this size. Thus for armada to be able to reliably
get back all the releases it requests, this patchset implements paging
with what should be a small enough page size to avoid the truncation.
Change-Id: Ic2de85f6eabcea8655b18b411b79a863160b0c81
This adds a `wait.resources` key to chart documents which allows
waiting on a list of k8s type+labels configurations to wait on.
Initially supported types are pods, jobs, deployments, daemonsets, and
statefulsets. The behavior for controller types is similar to that of
`kubectl rollout status`.
If `wait.resources` is omitted, it waits on pods and jobs (if any exist)
as before.
The existing `wait.labels` key still have the same behavior, but if
`wait.resources` is also included, the labels are added to each resource
wait in that array. Thus they serve to specify base labels that apply
to all resources in the release, so as to not have to duplicate them.
This may also be useful later for example to use them as labels to wait
for when deleting a chart.
Controller types additionaly have a `min_ready` field which
represents the minimum amount of pods of the controller which must
be ready in order for the controller to be considered ready. The value
can either be an integer or a percent string e.g. "80%", similar to e.g.
`maxUnavailable` in k8s. Default is "100%".
This also wraps up moving the rest of the wait code into its own module.
Change-Id: If72881af0c74e8f765bbb57ac5ffc8d709cd3c16