Commit Graph

82 Commits

Author SHA1 Message Date
Ruslan Aliev a58678d5d2 Add configurable ETCD parameters to aux cluster
Bump k8s, calico, etcd, coredns and helm.

Signed-off-by: Ruslan Aliev <raliev@mirantis.com>
Change-Id: I77373c223c6ea723ee31fe51e6fb4a9e84be03f7
2024-04-18 13:22:17 -05:00
SPEARS, DUSTIN (ds443n) 12fdf402f6 Add resource allocation setting for etcd sidecar
Change-Id: I4c284d9bbf2da91a6a0e43758d92bf007be25f9c
2024-02-12 11:58:18 -05:00
SPEARS, DUSTIN (ds443n) c3aac9628d Add liveness and readiness probe
This adds liveness/readiness probes to sidecar for etcd

Change-Id: If942de8b7c1a59e7da887e1bdc2626daf699aeab
2024-02-08 16:35:48 -05:00
SPEARS, DUSTIN (ds443n) 7ce7301476 Update ETCD to v3.5.11
Since after v3.5.6 etcd-io switched to a
distroless base image. Etcd anchor pods
are now using etcd-utility and etcd is
running a sidecar for health checks.

Change-Id: I198dca1209097de4d60a53a7568f0c4790679599
2024-02-08 10:35:33 -05:00
Sergiy Markin c1da28f637 [backups] Add throttlling of remote etcd backups
This PS adds a possibility to limit (to throttle) the number of
simultaneously uploaded backups while keeping the logic on the client
side using flag files on remote side.

Change-Id: I753faab8f3d934346d54e38bfc94cec3a8f79385
2023-12-19 16:14:43 +00:00
Sergiy Markin 748dfc535d [backups] Update staggered backups
This PS updates yaml tree of values getting aligned with similar changes
in osh-infra project.

Change-Id: I9a5fc987bea7b4cb1214e329e5f77a0e26011d8d
2023-12-05 04:17:10 +00:00
Sergiy Markin d1c4a54bf7 [backups] Added staggered backups
This PS adds staggered backups possibility by adding anti-affinity rules
to backups cronjobs that can be followed across several namespaces to
decrease load on remote backup destination server making sure that at
every moment in time there is only one backup upload is in progress.

Change-Id: I320c6ce6370b45c602114189819a4225e479f680
2023-12-04 22:03:29 +00:00
Zuul eb4efc172b Merge "Airflow stable 2.6.2" 2023-08-30 21:59:03 +00:00
Sergiy Markin 69a74590e7 Airflow stable 2.6.2
This PS updates python modules and code to match Airflow 2.6.2:

- bionic py36 gates  were removed
- python code corrected to match new modules versions
- selection of python modules versions was perfoemed based on
  airflow-2.6.2 constraints

Change-Id: I9c3e139b3437414a61af7e7c0b7d7e533fadefda
2023-08-29 21:12:11 +00:00
Anselme, Schubert (sa246v) 558acaf3bf
Parametrise etcd-anchor readiness probe
Change-Id: Iae3f1e5900c91b0ee7cb07c6f024cdcf41455125
Signed-off-by: Anselme, Schubert (sa246v) <sa246v@att.com>
2023-08-22 12:36:03 -04:00
SPEARS, DUSTIN (ds443n) 7a4051c6a3 Revert chart version
reverting chart versions to previous value

Change-Id: Id1d06f81d997d704af1a0bdb3fd0d8c9e8746360
2023-05-17 15:39:24 -04:00
SPEARS, DUSTIN (ds443n) 1717ed84e5 k8s upgrade to 1.27.1
upgrades kubernetes client to v1.27.1
upgrade etcd to v3.5.6

Change-Id: Iaf287353425aa6263a81617890a2ca3c2f2e4281
2023-05-17 10:32:04 -04:00
Markin, Sergiy (sm515x) d316409fbd [CPID-354] Improve MariaDB Backup/Restore validation process
Updating etcd chart with added backup validation function empty implementation(subject for future realization). This has to be done because helm-toolkit chart in openstack-helm-infra is now calling that function verify_databases_backup_archives() as part of backup_databases() function implementation:
https://review.opendev.org/c/openstack/openstack-helm-infra/+/853027

Changed apiVersion of etcd cronjob from batch/v1beta to batch/v1 and fixed securityContext for etcd_backup.

Also bumping up HTK version to 0.2.48 from a commit id obtained from merge of https://review.opendev.org/c/openstack/openstack-helm-infra/+/853027 and set proper commit id in this file: tools/helm_tk.sh

Change-Id: Ie047dd0e6a2aae6483ace89cad22d6720890cdfc
2022-09-09 12:24:03 -05:00
Ruslan Aliev e207bbe966 k8s upgrade to v1.23.7
Address changes and deprecations in Kubernetes v1.21=>v1.23

controller-manager:
* --authorization-kubeconfig and --authentication-kubeconfig must be set
* liveness/readiness probes must use HTTPS
* the default port has been changed to 10257

kubelet:
* --dynamic-config-dir has been deprecated, will not move to GA
* --cni-bin-dir has been deprecated, will be removed with dockershim
* --cni-conf-dir has been deprecated, will be removed with dockershim
* --network-plugin has been deprecated, will be removed with dockershim

https: //github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#deprecation
https: //kubernetes.io/docs/tasks/administer-cluster/reconfigure-kubelet/
https: //github.com/kubernetes/enhancements/tree/master/keps/sig-node/281-dynamic-kubelet-configuration
Change-Id: Ia996d7c14d81d1d8b8067f11c02ffb4ce90eb49a
2022-06-29 00:21:45 -05:00
Lo, Chi (cl566n) dc60ef8454 Removing set -x from function
Removing set -x from within the dump_databases_to_directory function.
The set -x from within the function is causing all the code that
follows the function call to have debug tracing on. This in turns
causing multiple identical logs for the same event. Looking at this
function, there should be enough logging to aid debugging.

Reference ps:  https://review.opendev.org/c/openstack/openstack-helm-infra/+/830533
               (commit 2fc1ce4a142e605a9fc6c90dceabbf7c4bfb81e3)

Change-Id: Id442972bbcca983afab7c4f3c29f3686e9e0b481
2022-02-24 18:54:54 +00:00
Sophie Huang 91c21ce14e Enhance ETCD backup
Pick up the helm-toolkit DB backup enhancement in etcd
to add capability to retry uploading backup to remote server.

Change-Id: If6ea347a4c2c55f14f35d95681aaf482d0a6103c
2022-01-25 22:04:25 +00:00
Sophie Huang 257ed54ddb Uplift HTK stable commit (db-backup-restore)
1) Uplift helm-toolkit to include db-backup-restore error log string
   prefixes for the generation of alert

   https://review.opendev.org/c/openstack/openstack-helm-infra/+/823867

2) Error log string prefixes are added to etcd backup-restore as well

Change-Id: Iad51a3e55567d0861140a97c17a1b7d859e13938
2022-01-12 21:23:06 +00:00
Phil Sphicas e4d9d99c13 Update charts to use stable Kubernetes APIs
Update applicable charts to use non-deprecated APIs [0], specifically
addressing the following resource types:
* ClusterRole
* ClusterRoleBinding
* Role
* Rolebinding

The APIs being migrated to are available in v1.19 or earlier. As of this
change, v1.19 is the oldest supported Kubernetes version, slated for EOL
on 2021-10-28. [1]

0: https://kubernetes.io/docs/reference/using-api/deprecation-guide/
1: https://kubernetes.io/releases/
Change-Id: I134b201d9ae01a8d74e34ee14f3bfe3b960cb5aa
2021-10-18 18:59:34 +00:00
Phil Sphicas 08906262fd Update tolerations and priority classes
* Give kube-proxy a blanket toleration
* Replace scheduler.alpha.kubernetes.io/critical-pod annotation with
    priorityClassName: system-node-critical

Change-Id: I810333913c09531eefa1ded014fe090d4cca7f7d
2021-10-18 11:33:54 -07:00
Sean Eagan 731deccf05 charts: move to helm 3 preferred apis
- `helm.sh/hook: test-success` > `helm.sh/hook: test`

Signed-off-by: Sean Eagan <seaneagan1@gmail.com>
Change-Id: If7dded45533705ee028e5d6da326ea94a634529d
2021-09-30 16:57:16 -05:00
Sean Eagan 9d696ca0a4 Use helm 3 in chart build
`helm serve` is removed in helm 3 so this moves
to using local `file://` dependencies [0] instead.

[0]: https://helm.sh/docs/chart_best_practices/dependencies/#repository-urls

Signed-off-by: Sean Eagan <seaneagan1@gmail.com>
Change-Id: Ia45c57e0cccac477f6ff59a254d03d6fcec14bef
2021-09-30 16:57:05 -05:00
Phil Sphicas 023e7d4d7d Uplift etcd to v3.4.13
Change-Id: I1e4452f3bd9ff434b0b68ddbbdc63c9d600f6932
2021-02-11 17:23:32 +00:00
Phil Sphicas d161528ae8 Avoid calico-etcd crashloop
Sometimes the calico-etcd pod crashloops when it is being bootstrapped.
This occurs intermittently in the gates.

Best guess .. when the etcd-anchor pod initially creates the etcd static
manifest, it waits for the anchor period (15 seconds) for the etcd pod
to become ready. If it is not ready, the next iteration through the loop
recreates an identical manifest. The fact that it is a new file causes
kubelet to terminate the original container and start up a new one.

Kubelet and the container runtime get out of sync, and kubelet can't
figure out the correct container id, so the pod ends up crashlooping
forever.  Manually removing and readding the manifest file doesn't
resolve the condition, although a kubelet restart actually does.

This "fix" will only write the updated manifest if it is different, and
hopefully will prevent the condition from occurring.

Change-Id: I4b6b1bf17fd8f0b36d24a741779505b38dba349f
2021-02-11 07:14:49 +00:00
Andrii Ostapenko 940253563a
Change helm-toolkit dependency version to ">= 0.1.0"
Since we introduced chart version check in gates, requirements are not
satisfied with strict check of 0.1.0

Change-Id: Ifd2d7af1f2dabe9bbccd65551e0223dddff529dc
2020-09-24 19:43:10 -05:00
dt241s@att.com 97427904bc Upgrade etcd to 3.4.3
1) Updated all reference of etcd Imags to 3.4.3

Change-Id: I629af43eb7e9689af3237361cf7a41fc35ed364c
2020-08-25 17:22:15 +00:00
KHIYANI, RAHUL (rk0850) fffb57109d Add security context template for etcd-backup chart
This change also removes etcd-perms container which is not required

Change-Id: Ia6c38424e0c2d177e35fc904a9551d601a31ac3b
2020-07-27 16:29:53 +00:00
Rick Bartra 0ffde4162e Run etcd with shareProcessNamespace: true to reap zombie processes
The kubernetes-etcd pods are leaving behind zombie processes and
setting 'shareProcessNamespace: true' eliminates that problem.

When you enable process namespace sharing for a Pod, Kubernetes uses a
single process namespace for all the containers in that Pod. The
Kubernetes Pod infrastructure container becomes PID 1 and automatically
reaps orphaned processes. [0]

[0]https://cloud.google.com/solutions/best-practices-for-building-containers#solution_2_enable_process_namespace_sharing_in_kubernetes


Change-Id: I61566fb71258baafa709b0e5367c71f13e980f6f
2020-07-24 17:40:31 +00:00
anthony.bellino cb4ae15eb1 Updating HTK commit ID for etcd backup/restore
Include fix [0] return code when remote rgw fails.
Moving set -x in backup/restore.tpl below the source
of the framework code to reduce debug output.

[0] https://review.opendev.org/#/c/738665/

Change-Id: If9b7b317dff439ecb293d9837cac256884c53c6a
2020-07-08 17:37:54 +00:00
Zuul c6c7a3accd Merge "ETCD remote backup enhancements" 2020-06-30 22:23:33 +00:00
anthony.bellino 95c1689e03 ETCD remote backup enhancements
1) Include framework for remote etcd backups.
2) Use porthole etcdctl utility image for backups.
3) Move helm-toolkit pin to latest commit.
4) Add a keystone user for RGW.
5) Add a secret for Swift API access.
6) Add a secret for backup/restore configuration.

Change-Id: Ica549c3b6bc00ca55540b8ffedd4c46af0d8d25e
2020-06-29 23:34:50 +00:00
KHIYANI, RAHUL (rk0850) 1e4b5e0d45 Add pod/container security context to promenade charts
This updates the coredns, haproxy and etcd chart to include the pod
security context on the pod template.

This also adds the container security context to set
readOnlyRootFilesystem flag

Change-Id: I9b5b0ea83acd4c5656577d8cbc684a5031ca0111
2020-06-29 17:06:02 -05:00
KHIYANI, RAHUL (rk0850) b51eb9802d Add apparmor profile to apiserver and etcd jobs
Change-Id: I8bed3213868b45a438e5ae5929bca8bef699a503
2020-05-28 13:04:12 -05:00
Phil Sphicas 4aab698486 Add configmap-hash annotations for etcd
Adds configmap-hash annotations to the etcd anchor daemonset for
configmap-bin and configmap-etc.

Does not add hash annotations for configmap-certs or secret-keys, with
the thought that if certs or keys are changed, some manual intervention
might be warranted, and restarting the anchors automatically might not
be desirable.

Change-Id: I22ff8fafa5d37c10138ddaa4095174b25fc087d8
2020-05-24 06:11:26 +00:00
KHIYANI, RAHUL (rk0850) 83104b345f Promenade: Add apparmor profile to promenade charts
This change adds apparmor profile to coredns, haproxy, etcd and
promenade charts

Change-Id: Ic0000f0bf515f6ddf0085b5ec0085a5a51e591b2
2020-05-18 16:18:28 +00:00
KHIYANI, RAHUL (rk0850) f2869e68cf Add apparmor profile to etcd chart
Change-Id: Ic17db9b9e96e6c47b6d970a8dd63ea338a8b4f7e
2020-02-19 18:36:48 -06:00
Phil Sphicas ce3f4742aa etcd chart: anchor pre_stop remove bashism
The anchor pre-stop script uses the 'function' command, which fails when
using dash. This change removes it for compatibility.

Change-Id: I6591045fa0a555800a03edbdf1f9f3a8476dd0a3
2020-02-18 15:12:23 -08:00
Phil Sphicas 7c6043772b etcd chart: additional env vars for etcd pods
Allows extra environment variables to be applied to the etcd pods. Can
be used to apply tuning parameters, enable experimental flags, etc.

Change-Id: I9d82514b6e3a292edc472d885c0a61d5c81199f5
2020-02-07 16:06:43 -08:00
Matt McEuen 1d0a4619b4 Add -u to anchor scripts
This adds "set -u" (in addition to the existing -x) to the anchor
scripts. This should fix an issue seen occasionally in the haproxy
chart which is only explainable by the IDENTIFIER variable failing
to get set correctly.

All variables used in the anchor scripts ought to be defined, and
there's no need to rely on blank strings as defaults.

"set -e" was considered for this, but may have unintended side-effects:
-u should be safe and avoid the issue we've seen.

Change-Id: Idbc2f9f77d4754874999d5d83d322a17076c7392
2020-02-03 14:00:12 -06:00
Doug Aaser 4cd75e26a0 Uplift etcd to v3.4.2
Uplift etcd to v3.4.2
Also uplifts calico in the gate so that it works with etcd v3

Change-Id: Iac93cadfad813223f9364e513fae00afa178113e
2019-11-25 17:12:00 +00:00
Matt McEuen fcaacf94a3 Add -e to pre_stop hooks
This adds -e to the pre_stop scripts, so that they fail out if
any of their commands fail.  This is required, since it's the only
way to communicate whether there is an issue during pre_hook
execution.

"The logs for a Hook handler are not exposed in Pod events.
If a handler fails for some reason, it broadcasts an event."
https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#container-hooks

As an example, this issue was discovered when "touch /tmp/stop"
was failing silently due to a readOnlyRootFilesystem setting,
resulting in pods that would not successfully Terminate until
the grace period was exhausted.

Change-Id: Ic9a228230d944530e31ed61f4239fd434cbb6187
2019-11-07 17:31:50 -06:00
Phil Sphicas a7c7282ba4 Fix: anchor pre-stop failures
kubernetes-controller-manager-anchor pods get stuck in Terminating state
because the pre-stop script tries to touch /tmp/stop, which is on a read
only root filesystem.

This change mounts an emptyDir at /tmp to resolve the issue.

The same change is applied to apiserver, etcd, and scheduler anchors, to
prevent the issue if readOnlyRootFilesystem is enabled.

Related change for haproxy:
https://review.opendev.org/685711/

Change-Id: I784498e0dc24da91a983716029973919b96a3055
2019-11-04 15:14:27 -08:00
Zuul 3d7ecfd190 Merge "(etcd) Support dash shell" 2019-09-10 22:05:54 +00:00
Luna Das d3501bc006 Add facility to configure log levels in kubernetes-etcd
Change-Id: Iefaa48b9eb3403cf6955374d5ea460f676e0806b
2019-09-10 19:42:03 +05:30
Scott Hussey 6aeab9e490 (etcd) Support dash shell
- Rewrite some anchor scripting to support dash
  - 'function' not supported, refactor POSIX function declarations
- Rewrite aux monitor to support dash
  - Same
Change-Id: If44c59be2f30fd30c1a668bc27e58b37575610b5
2019-09-01 01:22:44 -05:00
rajesh.kudaka 490dd63c2c Enable probes config for etcd
This commit enables configuration of probes
for etcd pod by manipulating/overriding values in
values.yaml or through manifests.

Change-Id: I69eabd13f8ea8b97a33281ad993ec2e88b9280bc
2019-08-09 09:28:47 +00:00
Crank, Daniel (dc6350) ce1e5fa342 Fixes to etcd backup script
1. Fix directory listing used to identify newest backup file to be
archived (was sometimes archiving files twice; e.g., a.tar.gz.tar.gz)

2. Fix directory listings used to identify and clean up old backups

Change-Id: Icb1ddd96613f4ab6a28c4f617001c336951568bc
2019-07-25 13:36:59 -05:00
Zuul fe60268244 Merge "Allow etcd anchor to recover from bad state" 2019-06-26 17:58:23 +00:00
Hussey, Scott (sh8121) d2f020fbb7 Allow etcd anchor to recover from bad state
- If an etcd member has corrupted data or has somehow
  been removed from a cluster, the anchor does not currently
  recover. This change adds a threshold of X monitoring loops
  after which the anchor will remove the member from the cluster
  and recreate it.

Note: This is safe due to etcd's strict quorum checking on
      runtime reconfiguration, see [0].

[0] https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/configuration.md#--strict-reconfig-check

Change-Id: Id2ceea7393c46bed9fa5e3ead37014e52c91eac3
2019-06-26 07:56:59 -05:00
Kumar, Nishant(nk613n) 75d3a86234 Add release uuid annotation to POD spec
Change-Id: Id4a96de7da9233589b54217e04a346281eaea68c
2019-06-25 14:55:05 +00:00
RAHUL KHIYANI f50a0c8d78 ETCD: Add pod/container security context
This updates the etcd chart to include the pod
security context on the pod template.

This also adds the container security context to set
readOnlyRootFilesystem flag to false

Change-Id: I34a8ab3e850779192491b9b127a82b82f05fa00b
2019-06-13 02:01:16 +00:00