Commit Graph

83 Commits

Author SHA1 Message Date
Sean Eagan a5730f8db8 Remove Tiller
For now we leave the tiller status enpdpoint, until
Shipyard has had a release to stop depending on it [0].

[0]: https://review.opendev.org/c/airship/shipyard/+/802718

Signed-off-by: Sean Eagan <seaneagan1@gmail.com>
Change-Id: If8a02d7118f6840fdbbe088b4086aee9a18ababb
2021-10-05 02:41:32 +00:00
Sean Eagan 68747d0815 Use helm 3 CLI as backend
Helm 3 breaking changes (likely non-exhaustive):

- crd-install hook removed and replaced with crds directory in
  chart where all CRDs defined in it will be installed before
  any rendering of the chart
- test-failure hook annotation value removed, and test-success
  deprecated. Use test instead
- `--force` no longer handles recreating resources which
  cannot be updated due to e.g. immutability [0]
- `--recreate-pods` removed, use declarative approach instead [1]

[0]: https://github.com/helm/helm/issues/7082
[1]: https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments

Signed-off-by: Sean Eagan <seaneagan1@gmail.com>
Change-Id: I20ff40ba55197de3d37e5fd647e7d2524a53248f
2021-10-04 21:40:26 -05:00
Sean Eagan 8c5e5c7d24 Remove unused commands
This removes release rollback/delete functionality. This functionality
was likely not being used and thus was likely not working.

This primary driver for this change is to ease introduction of Helm 3
support. Particularly to avoid having to make API changes related to
the namespacing of helm releases in Helm 3.

This also removes the swagger api documentation as it was not
maintained.

Change-Id: I7edb1c449d43690c87e5bb24726a9fcaf428c00b
2021-09-30 17:22:16 -05:00
Sean Eagan 58c0df5201 Extract pre-update actions out of tiller handler
This is a pre-requisite for Helm 3 integration, so that these
actions run regardless of whether we are going through the
tiller handler.

Change-Id: I97d7bcc823d11b527fcdaa7967fcab62af1c8161
2021-09-30 17:22:16 -05:00
DeJaeger, Darren (dd118r) 9aadc14777 Armada improved logging, uplift dependency
This PS:

1) Looks to improve specific logging in Armada, so that
it's easier to debug deployment related issues
2) Uplifts the k8s Python dependency to 12.0.0
3) Enforces 'watch' timeouts more strictly, as the call to
the Kubernetes Python watch function seemed unreliable.
4) Adds a field selector to the 'watch' stream to look for
the DELETE action to have been completed on the specific
pod/job/cronjob, rather than looking across the whole
namespace or via labels. This will narrow what the watch
is looking at, making the logs less busy.

Change-Id: I1952b0db32fb0b56ffffcddeae0532beb5a27b67
2021-06-24 10:53:06 -04:00
KAVVA, JAGAN MOHAN REDDY (jk330k) 36efc4828d Move Tiller version to 2.16.9
Update Helm chart for Armada to use Tiller version 2.16.9.

depends on: https://review.opendev.org/#/c/749497/

Change-Id: I16f7a5e8e571f067154e79a5f2ceb18be7d8db2d
2020-09-17 10:48:44 -05:00
Drew Walters 764e99e325 handlers: Remove dry-run functionality
Armada's dry-run option is incomplete, no longer maintained, and offers
little value for the complexity required to maintain it.

This commit is the final in a series of changes to remove the dry-run
feature. Specifically, this change removes the functionality associated
with the dry-run feature.

Story: 2005121

Change-Id: I7dfe5ab27511debe2b8ac01f8e0a696c6126a9f7
Signed-off-by: Drew Walters <andrew.walters@att.com>
2019-09-05 16:29:04 +00:00
HUGHES, ALEXANDER (ah8742) b787c418e3 Standardize Armada code with YAPF
From recently merged document updates in [0] there is a desire to
standardize the Airship project python codebase.  This is the effort
to do so for the Armada project.

[0] https://review.opendev.org/#/c/671291/

Change-Id: I4fe916d6e330618ea3a1fccfa4bdfdfabb9ffcb2
2019-07-31 10:16:15 -05:00
Sean Eagan e51db14add Revert "Move to helm 2.14"
There is a breaking change in helm 2.14.0 [0]. This is expected to be fixed in helm 2.14.1, reverting until we can update to that.

[0]: https://github.com/helm/helm/issues/5750

This reverts commit 89d98fb827.

Change-Id: Ica6d51b5c67a26c356804fd69d466e88ad5c216b
2019-06-05 20:11:53 +00:00
Sean Eagan 89d98fb827 Move to helm 2.14
Change-Id: I6439650076b289d3983e119c06181baf6562ccc3
2019-05-17 11:50:19 -05:00
Sean Eagan 8a50591dbf Introduce v2 docs
This introduces v2 docs in order to allow users to opt in to
breaking changes, while still supporting v1 docs for a time
so folks can migrate. At some point v1 doc support will be
removed.

This initial version of v2 docs is experimental. Further
breaking changes will be made before v2 docs are finalized.

A v1-v2 migration guide is included in the documentation.

This also refactors the internal data model to include the full
document structure, such as `metadata` and `schema`, so that
different behavior can be acheived for v1, v2, etc.

Change-Id: Ia0d44ff4276ef4c27f78706ab02c88aa421a307f
2019-04-16 10:15:21 -05:00
Drew Walters 12f4e8d2c3 tools: Update Helm to v2.13.1
Helm v2.13.1 has been released [0], and is the next version of Helm
Armada is compatible with. Currently, Armada is not compatible with the
latest version of Helm toolkit due to a divergence caused in Helm v2.13.
This change uplifts Helm to v2.13.1 to restore compatibility with the
latest version of Helm toolkit.

[0] https://github.com/helm/helm/releases/tag/v2.13.1

Change-Id: Ieaf2475562c56530b6ec69c6a43611b4b47b7c83
2019-03-28 15:19:28 +00:00
Nishant kumar c132915dcc Enable Armada to acquire Tiller IP from config file
This adds a new configuration default to specify the Tiller host IP.
This is important to be able to configure in environments where
Armada is unable to find a Tiller pod.

Change-Id: I12fd9fbd16f2b591620e566affcf19f859ed1855
2019-03-11 12:39:39 -05:00
Zuul 3c60a576f9 Merge "Add support in Armada CLI to pass user bearer tokens to tiller" 2019-02-28 14:47:02 +00:00
Shoaib Nasir 7fb3b8d9ca Add support in Armada CLI to pass user bearer tokens to tiller
Added a new option --bearer-token TEXT in the Armada CLI to allow
the users or applications to pass kubernetes-api bearertokens via
tiller to the kubernetes cluster. This is to allow armada to interact
with a kubernetes cluster that has been configured with an external
Auth-Backend like Openstack-keystone or OpenId Connect.

Bearer Tokens are Auth tokens issued by the identity backends
such as keystone which represent a users authorized access.
For better understanding of bearer tokens, an example case
of how they works can be found here
https://kubernetes.io/docs/reference/access-authn-authz/authentication/#putting-a-bearer-token-in-a-request
https://docs.docker.com/registry/spec/auth/token/

Change-Id: I03623c7d3b58eda421a0660da8ec3ac2e86915f0
Signed-off-by: Shoaib Nasir <shoaib.nasir@windriver.com>
2019-02-01 15:33:18 -05:00
Sean Eagan 47ebd27cad Add configurability of delete timeout
Previously the timeout for deleting chart releases was 300s and
not configurable, this patchset makes it so via a new
`delete.timeout` property in the `armada/Chart/v1` schema.

Helm releases deleted which do not correspond to documents in this
schema still do not use a configurable timeout. Those will be
considered separately.

This also includes a minor logging fix.

Change-Id: Ia588faaafd18a3ac00eed3cda2f0556ffcec82c9
2019-01-29 16:49:01 -06:00
Sean Eagan c31a961bf1 Automate deletion of test pods
When running helm tests for a chart release multiple times in a site,
if the previous test pod is not deleted, then the test pod creation
can fail due to a name conflict. Armada/helm support immediate test pod
cleanup, but using this means that upon test failure, the test pod logs will
not be available for debugging purposes. Due to this, the recommended approach
for deleting test pods in Armada has been using `upgrade.pre.delete` actions.
So chart authors can accomplish test pod deletion using this
feature, however, it often takes awhile, usually not until they test upgrading
the chart for chart authors to realize that this is necessary and to get it
implemented.

This patchset automates deletion of test pods directly before running tests by
using the `wait.labels` field in the chart doc when they exist to find all pods
in the release and then using their annotations to determine if they are test
pods and deleting them if so.

A later patchset is planned to implement defaulting of the wait labels when
they are not defined.

Change-Id: I2092f448acb88b5ade3b31b397f9c874c0061668
2019-01-28 13:19:09 -06:00
Sean Eagan e6f294bacb Move to tiller 2.12.1
Tiller 2.12 [0] adds:

- kubernetes 1.11 support
- fix for a concurrency issue [1]

[0]: https://github.com/helm/helm/releases/tag/v2.12.0
[1]: https://github.com/helm/helm/pull/4828

Change-Id: I99ddd9d105b81177d3b7e5691afebbcca97c119f
2019-01-11 10:52:17 -06:00
lijunjie 749f7107d0 Fix the misspelling of "except"
Change-Id: Iabfe5a9b2a99e32ab975257fe5db2bd3b29d26bf
2019-01-04 18:22:35 +08:00
Zuul a64d435de8 Merge "tiller: Remove unused params from delete_resources" 2018-11-14 19:37:33 +00:00
Drew Walters 5cafd027b5 tiller: Remove unused params from delete_resources
Parameters `release_name` and `name` are ignored by the Tiller handler's
`delete_resources` method because the deletions are handled using labels
rather than by name. Currently, values that do not represent the
parameters are being passed to the method, which sometimes leads to
cryptic logging messages. This change removes all references to the
aforementioned parameters and clarifies the corresponding docstring and
log message.

Change-Id: Ic43819a273bf9da5e8965f409a56307eb11b4922
2018-11-13 16:32:21 +00:00
Sean Eagan 7af22df7dc Implement tiller gRPC channel clean up
We have seen issues with dangling threads in Armada. This is likely due to
a bug [0] in the version of gRPC that we were pinned to.

This patchset:

- moves us to the latest versions of the gRPC python libraries which add
  a new `channel.close()` method to cleanup channels.
- implements the python context manager api in the tiller handler
- uses the context manager api to explicitly scope tiller channel creation
  and cleanup to each Armada API and CLI call.

This also fixes a couples issues with error handling introduced in [1].

[0]: https://github.com/grpc/grpc/issues/14338
[1]: https://review.openstack.org/#/c/610384

Change-Id: I2577a20fc76c397aa33157dc12a0e1d36f49733e
2018-11-12 13:32:52 -06:00
Vladyslav Drok 1986a935f6 Don't swallow exceptions when doing tiller actions
Change-Id: I83d4c185106a2ace602fe950dcb25b86f51748bb
2018-11-02 22:37:12 +00:00
Sean Eagan 6b96bbf28d Correctly identify latest release
This fixes the following issues with listing releases from tiller,
which could cause Armada to be confused about the state of the
latest release, and do the wrong thing.

- Was not filtering out old releases, so we could find both a
  FAILED and DEPLOYED release for the same chart. When this is the
  case it likely means the FAILED release is the latest, since
  otherwise armada would have purged the release (and all its
  history) upon seeing the FAILED release in a previous run.
  The issue is that after the purge it would try to upgrade
  rather than re-install, since it also sees the old DEPLOYED
  release. Also if a release gets manually fixed (DEPLOYED)
  outside of armada, armada still sees the old FAILED release,
  and will purge the fixed release.
- Was only fetching DEPLOYED and FAILED releases from tiller, so if
  the latest release has another status Armada won't see it at all.

This changes to:

- Fetch releases with all statuses.
- Filter out old releases.
- Raise an error if latest release has status other than DEPLOYED
  or FAILED, since it's not clear what other action to take in
  this scenario.

Change-Id: I84712c1486c19d2bba302bf3420df916265ba70c
2018-10-19 09:14:15 -05:00
Sean Eagan 5e7c36d2a1 Avoid bug in tiller when both sorting and paging releases
The tiller list releases command has a bug when using sorting
and paging simultaneously. Armada was passing sorting parameters,
but it doesn't really care about the order, so this removes the
sorting parameters to avoid the tiller issue.

Change-Id: If8349a8093d4b79d5e056d988b710372705eb669
2018-10-17 14:47:16 -05:00
Sean Eagan 6a744d77ea Fix list releases paging
Fix an issue where release paging failed to break out of the loop when no
releases were found in tiller.

Change-Id: I4a25e58a7f6bdd88941f7f87cba2a0aee261f8be
2018-10-15 16:30:35 -05:00
Sean Eagan e149afdcbe Use paging to list releases from tiller
Tiller has a non-configurable gRPC max response message size. If the
list releases response reaches this size it silently truncates the
results to be below this size. Thus for armada to be able to reliably
get back all the releases it requests, this patchset implements paging
with what should be a small enough page size to avoid the truncation.

Change-Id: Ic2de85f6eabcea8655b18b411b79a863160b0c81
2018-10-12 21:28:22 -05:00
Sean Eagan a9d55ab052 Clean up and refactor wait logic
This patchset changes the wait logic as follows:

- Move wait logic to own module
- Add framework for waiting on arbitrary resource types
- Unify pod and job wait logic using above framework
- Pass resource_version to k8s watch API for cleaner event tracking
- Only sleep for `k8s_wait_attempt_sleep` when successes not met
- Update to use k8s apps_v1 API where applicable
- Allow passing kwargs to k8s APIs
- Logging cleanups

This is in preparation for adding wait logic for other types of resources
and new wait configurations.

Change-Id: I92e12fe5e0dc8e79c5dd5379799623cf3f471082
2018-09-25 12:48:25 -05:00
Sean Eagan 9c3ebe68c7 Move to tiller v2.10.0
- Update Helm to v2.10.0
- Update hapi protoc gen files

Change-Id: Ibcf813e4d79df104e972fae9f9328fb49b403649
2018-08-28 17:07:31 -05:00
Zuul 2bd301efaa Merge "Wait for jobs to complete" 2018-07-23 14:55:38 +00:00
Marshall Margenau ad790b98d7 Wait for jobs to complete
- Wait for jobs to show as completed, instead of relying on pods
  associated with the job to show healthy, as the pods can go
  healthy or be removed while the job is still processing. Armada
  would continue forward as soon as all pods in current scope
  show as healthy.
- Refactor delete pod action a bit, including removing unused code.
- Fixed bug in waiting for pods to delete (in tiller handler L274).
  Bug caused a hung state while deleting pods as a pre-update hook,
  by passing timeout value in the incorrect position.

Change-Id: I2a942f0a6290e8337fd7a43c3e8c9b4c9e350a10
2018-07-20 19:29:33 +00:00
Zuul 8da8adc5ef Merge "Update Helm version" 2018-07-20 18:28:53 +00:00
Marshall Margenau 68a507e81b Update Helm version
- Update Helm to v2.9.1
- Update hapi protoc gen files
- Update kubernetes client to >=6

Change-Id: I53480e26683cbaa2b148aaa0f574ee7fb6147ce5
2018-07-20 16:08:28 +00:00
Marshall Margenau c7c7dc671c Removing dead code.
Change-Id: I7121a6d29691cf8d3e779f2afe9ada7d263c6c9d
2018-07-18 16:04:26 -05:00
Zuul 164f4aa7d7 Merge "Change chart `test` key to object and support cleanup flag" 2018-06-28 14:43:28 +00:00
Sean Eagan 2a1a94828d Change chart `test` key to object and support cleanup flag
Previously the chart `test` key was a boolean.  This changes it to an
object which initially supports an `enabled` flag (covering the
previous use case) and adds support for helm's test cleanup option
(underneath an `options` key which mirrors what we have for `upgrade`).
Existing charts will continue to function the same, with cleanup always
turned on, and ability to use the old boolean `test` key for now.  When
using the new `test` object however, cleanup defaults to false to match
helm's interface and allow for test pod debugging.  Test pods can be
deleted on the next armada apply as well, to allow for debugging in the
meantime, by adding `pre`-`upgrade`-`delete` actions for the test pod.
The `test` commands in the API and CLI now support `cleanup` options as
well.

Change-Id: I92f8822aeaedb0767cb07515d42d8e4f3e088150
2018-06-27 10:47:02 -05:00
Felipe Monteiro 9dad7c17c9 chore(docstring): Fix up improper sphinx syntax in docstrings
This fixes up improper sphinx syntax in docstrings by making
the following corrections:

  * params => param
  * :param - => :param

Change-Id: I1ff457d609128ae7c5fac2c7190f5ff1a88315b3
2018-06-22 21:35:29 +00:00
Marshall Margenau f235512d57 Adding yapf config, plus formatted code.
- Adding yapf diff to pep8 target
- Adding yapf tox target to do actual format

** The rest of this PS contains formatted code only, no other changes

Change-Id: Idfef60f53565add2d0cf65bb8e5b91072cf0aded
2018-06-22 14:56:04 -05:00
Sean Eagan d91dd8ad70 Fix and overhaul helm test integration
The helm test integration was severely broken, this fixes it by:

* correctly handle tiller test call response
* removes unnecessary call to tiller to get release content
* removes unnecessary call to k8s to check for test pod completion
* moves common logic into a test handler
* adds test coverage for the above
* adds logging for test results streamed from tiller

Change-Id: I09062387a1abc2fc3f6960f987c97248d9e1cb69
2018-06-21 14:41:52 -05:00
Marshall Margenau 6546139155 Implement `protected` parameter
The `protected` parameter will be used to signify that we should
never purge a release in FAILED status. You can specify the
`continue_processing` param to either skip the failed release and
continue on, or to halt armada execution immediately.

- Add protected param to Chart schema and documentation.
- Implement protection logic.
- Moved purging of FAILED releases out of pre-flight and into sync
  for finer control over protected params. This means failed
  releases are now purged one at a time instead of all up front.
- Added purge and protected charts to final client `msg` return.

- Fix: Added missing dry-run protection in tiller delete resources.

Change-Id: Ia893a486d22cc1022b542ab7c22f58af12025523
2018-06-17 20:04:53 -05:00
Marshall Margenau 52bf21989f Fix release name bug
The release name was being treated as multiple different values to
mean the same thing, when paired with the 'release_prefix'.  This
commit addresses the bug, changing all instances to use the
'release' value instead of 'chart_name' or others.

Note: This is an impacting change, in the sense that it will
cause more reliable behavior in Armada's Apply processing which
could have actual impact while upgrading components installed with
a previous version of Armada.  Previuosly undeleted FAILED releases
may now be deleted, and armada test and delete actions may now
run as expected where they didn't run before.

Change-Id: I9893e506274e974cdc8826b1812becf9b89a0ab6
2018-06-15 11:21:38 -05:00
Sean Eagan ae690ef828 Expose helm's upgrade/rollback force and recreate pods flags
This exposes helm's force and recreate pods flags for upgrade and
rollback.  It exposes in the chart manifest an options field underneath
the upgrade field to hold options to pass through to helm, and
initializes it with these two flags.  Since rollback is currently a
standalone operation which does not consume manifests, these flags are
directly exposed as api and cli arguments there.

Change-Id: If65c1e97d437d9cf9d5838111fd485c80c76aa1d
2018-06-13 11:28:20 -05:00
Sean Eagan 571c0b77f9 Add command to rollback release to CLI and API
This adds a command to the CLI and API to rollback a release name to a
specified version.

Change-Id: Ie1434da42ccc75c658b7bde7164b3f4c909be7c4
2018-06-06 09:40:31 -05:00
Marshall Margenau d770640b95 Revise wait timeouts plus dry-run.
- revise wait on namespace+label, only wait on ns+label for
  charts we've touched in the current apply loop
- skipping any actions that would change system during dry-run
- skip 'test' and 'wait' during dry-run
- tweaking some logs for insight and readability

Change-Id: I1223f01690832c26ce2faa96e7e64620cf413ac9
2018-05-30 16:19:35 -05:00
Sean Eagan 2b714888c4 Delete cron jobs too on pre-upgrade job delete actions
Adds a 'cronjob' key for pre-upgrade delete actions to delete cron jobs.
The 'job' key now also deletes cron jobs as well, since existing clients
were expecting that behavior.

Change-Id: Id320710a935976c9c1320c25049b7f22ee4136ba
2018-05-07 16:37:01 +00:00
Marshall Margenau dc508d5012 fix(timeouts): Address timeout handling issues
- fixing wait handling in multiple areas
      -- wait for deleted pods before continuing Apply update
      -- cleaning up and delineating wait for charts vs chartgroups
      -- timeout exceptions to stop execution
    - api/cli 'timeout' param now applies to all Charts
    - api/cli 'wait' param now applies to all Charts
    - update some docs
    - several TODOs to be addressed in future PS

Closes #199

Change-Id: I5a697508ce6027e9182f3f1f61757319a3ed3593
2018-05-01 08:45:56 -05:00
Mark Burnett df91412ed6 Works around low RELEASE_LIMIT by increasing it
We have been deploying in the low 60's of charts in many environments,
but the current treasuremap environment is running 68.

This led to a situation where tiller would find all 68 charts, but then
return only the newest 64 (RELEASE_LIMIT) in its response.  That meant
that armada did not see those charts when deciding whether to install or
upgrade.

Change-Id: I5d2a06f806006947ad48cc0e24b3f91baa3f37fd
2018-04-17 14:39:05 -05:00
Marshall Margenau 60b8a37f47 bug(deleted jobs) Armada deleting jobs during upgrade
- Additional logging to try to expose bug around deleted jobs
  during an upgrade.
- Cleaner chart diff logging.

Change-Id: I5edfa1857aec417203e73565a39082328e3b677b
2018-04-13 00:30:21 -04:00
Marshall Margenau 13c4e3372a feat(hapi) updating hapi for new grpcio
Change-Id: I8283f5c1cda7d0042d371b382a6d7c49c1705d48
2018-03-09 22:33:39 -05:00
Marshall Margenau 3430283865 feat(logging): Enhance logging and update grpcio
Enhance request logging (and scrub sensitive headers)
Enhance Tiller logging
Update grpcio, unpin from 1.6.0rc1

Plus a couple typo fixes
Plus a couple unused vars

Change-Id: I8afd679f6716c6e1af234a59ac44ba1fdc73cdc8
2018-03-09 11:36:57 -05:00