Commit Graph

76 Commits

Author SHA1 Message Date
Mosher, Jaymes (jm616v) 502a74064c Add optional pre/post install commands to divingbell-apt
Change-Id: I3fdee4b128bfba80bd827fb6a64b800652cdee2f
2023-11-30 10:59:46 -07:00
Ruslan Aliev 9ef6046f33 Add whitelist of packages to bypass verification
Change-Id: I459f4a241496cf98bd0bb00f3843f2b58bb397c1
Signed-off-by: Ruslan Aliev <raliev@mirantis.com>
2023-05-16 18:23:27 -05:00
Ruslan Aliev 234248c272 Add readiness probe to divingbell-exec
Also add dist-upgrade verification.

Change-Id: I0716ee878e9a2fa9a557debe543996691c0540ce
Signed-off-by: Ruslan Aliev <raliev@mirantis.com>
2023-04-30 23:03:17 -05:00
SPEARS, DUSTIN (ds443n) 7d533d65c3 Adding readiness/liveness probes to apt
This adds readiness and liveness probes to set daemonset to a non-ready status during dpkg usage

Change-Id: I5b9d029f1f8f696b4132a27ea29a77465babc29c
2022-10-19 15:09:04 -04:00
SPEARS, DUSTIN (ds443n) ebf0e22964 Add checks for dpkg availability
Check that dpkg is available before continuing to prevent unwanted pod restarts.

Change-Id: I6925cd074b88d10a858f044da21c7e20a7a238e5
2022-09-30 10:47:30 -04:00
Walter Wahlstedt 229bbe75b0 Create option to turn on verbose logging.
Change-Id: I1ad71a603a92e44ee93e0663c7b2db216a1811ff
2022-01-19 16:34:26 -05:00
Phil Sphicas 1858d0ef37 perm: Optionally ignore missing files
The default behavior of divingbell-perm is to fail when trying to assign
permissions to non-existent files.

This change adds an option to values.yaml to skip any missing files and
proceed with the rest of the assignments.

    conf:
      perm:
        ignore_missing: true   # default is false

This may be useful in cases where files will never exist on a node, or
cases where the file does not exist yet, but will exist later. Note that
with this option enabled, a run in which files are skipped is considered
successful, so the rerun_policy and rerun_interval will determine if and
when another attempt will be made.

Change-Id: I15505d6292dda66942c66eea5a4d0666bd6bdfa7
2021-09-07 20:32:12 +00:00
Phil Sphicas 3007010064 perm: Various fixes (values hash, revert)
The hash used by divingbell-perms to decide whether or not to rerun the
permissions script was being generated incorrectly, using a fixed value
instead of actually looking at the values passed to the chart.

This change updates the hash to reflect conf.divingbell.perms, and will
rerun the script if the hash changes.

Also fixes the logic to revert permissions.

Change-Id: I74f056f69a1b7f0eb9223915b1671e1e18091483
2021-09-07 20:30:59 +00:00
Phil Sphicas d657f7968c apt: Remove /var/lib/apt/lists before update
When divingbell-apt is managing the apt sources list, remove the
contents of /var/lib/apt/lists before running apt-get update.

Change-Id: I379af0b1a887bc81bc76f57289f35bae64e146c6
2021-03-14 06:46:08 +00:00
Phil Sphicas 918da6d055 Avoid rbd unmap failure; use HostToContainer mountPropagation
The divingbell pods use a hostPath volume for the root filesystem.
Because this mount includes /var/lib/kubelet, the pod holds a reference
to every volume mounted by every pod on the same host.

The most visible case where this causes a problem is the termination of
a pod that uses a ceph-backed PVCs. When kubelet tries to unmap the rbd
device, it is unable to do so, manifesting in the kubelet logs as:
    rbd: unmap failed: (16) Device or resource busy

This change sets the mountPropagation to HostToContainer for the rootfs
volume, so that the divingbell pods will not prevent kubelet from
releasing these devices.

https://kubernetes.io/docs/concepts/storage/volumes/#mount-propagation

Change-Id: I6e91fb9b9d7cbe852c5e6dc8b7224d6085175590
2020-11-24 23:57:54 +00:00
Phil Sphicas 55ba4cb61c Allow node selector configuration per module
This change adds the ability to configure node selectors per module. The
default node selector is 'kubernetes.io/os=linux'. For example:

    labels:
      apt:
        node_selector_key=divingbell-apt
        node_selector_value=enabled

Will result in a node selector of 'divingbell-apt=enabled'.

Change-Id: I7150c5f998afa30dce22f505be4d0d164254214f
2020-10-03 01:30:56 +00:00
Prateek Dodda 30200a54d9 Implement Security Context for Divingbell
Change-Id: Ibc93ccac6d6015faff3491211f5f8cb752a0328f
2020-03-30 23:04:50 +00:00
Anderson, Craig (ca846m) 32da2fbd4b Add ability to disable package uninstalls
Allow users to disable auto-uninstall functionality for packages.

Change-Id: Ib59ff175fc474a592118374c23974c6a9439cd72
2020-03-23 10:23:20 -07:00
Zuul db4f382b59 Merge "Update dpkg commands to be non-interactive" 2020-03-20 20:37:00 +00:00
Michael Beaver b98efc4f29 Update dpkg commands to be non-interactive
The current `dpkg --configure -a` command does not always work if the
package that needs to be configured has a modified conffile which can
require user input to resolve. This change adds flags to make these
lines work as intended in that scenario.

Change-Id: I8f459b0c1c2fc7ecbe1ff478bdb77fd9af31dc90
2020-03-19 14:10:44 +00:00
Crank, Daniel f0eb0b7582 [ad-hoc] Fix test case exit conditions
While working on another change, I discovered conditions
in many test cases that echoed fail messages but did not
actually exit, so the gate could succeed even though some
tests failed. This patchset aims to fix those problems, and
then fix the problems masked by those problems:

1) fix bug in revert function of file permissions module
preventing permissions from being reverted.
2) fix various syntax and logic problems in test script
3) add wait_for_tiller_ready function to avoid race condition
with test script using helm too early
4) add install for ethtool in test script
5) ignore ethtool pod failures (see note #1 in [0])
6) make logging of test results more uniform
7) Fix error message logic in perm.sh
8) Fix case in _shcommon.tpl where error message was not
logged, causing test script to unnecessarily wait for
container timeout

[0]: https://review.opendev.org/676010

Change-Id: I22182d35250c37c96e73d9f5f49abfb2246f2a35
2020-03-12 15:25:30 +00:00
KAVVA, JAGAN MOHAN REDDY (jk330k) 37594c8d16 Add Docker default AppArmor profile to divingbell
This adds default AppArmor profile to divingbell.

Also, update to gate script to install ethtool if it is not present.

Change-Id: I7abb13a533b596f4db5fe65fdae5eb7fc57ec00a
2020-02-13 14:43:44 -08:00
Michael Beaver fe0a034ec7 Add --no-install-recommends to apt install
This change adds the --no-install-recommends flag to the apt-get
install command portion of _apt.sh.tpl. This will modify Divingbell
to only install direct dependencies of packages instead of following
the default apt behavior, which is to also install recommended packages

Change-Id: I118a72e1e591101b0e2878e088e9fbaa96067d2c
2020-01-29 18:29:06 -06:00
Drew Walters fe270ec595 apt: Add whitelist for strict mode
This change adds a whitelist of packages that will be ignored when using
strict mode.

Change-Id: I9138f35a72618100e6094575271f6160336332f4
Signed-off-by: Drew Walters <andrew.walters@att.com>
2020-01-27 21:23:27 +00:00
Crank, Daniel 3cc1620319 Remove 'autoremove' from strict mode apt purge
This patchset makes two changes for strict mode only:

1) Removes the --autoremove flag from the apt-get purge
   command line
2) Causes the install stage to call apt-get install on
   all packages regardless of whether they're already
   installed. This will have the effect of marking all
   requested packages as manually installed if they
   were previously auto-installed.

Change-Id: Ic1a39205c941973af9d82685180d28457ea2011f
2020-01-25 13:15:46 -06:00
Crank, Daniel 44525162a5 Add "strict" mode for apt package removal
Currently, divingbell-apt will only remove packages that aren't
on the current requested package list when they were previously
installed by divingbell-apt. This patchset adds a "strict" mode
which causes it to remove packages not on the requested package
list regardless of whether divingbell installed them (i.e., it
can remove unwanted packages that were part of the host's base
image).

Change-Id: Ie2ba5d47646bfaaf030cb54673e644ab0e917fd4
2020-01-24 12:19:22 -06:00
Schiefelbein, Andrew (as3525) as3525@att.com ac357b9bff This is to allow for ganged install of packages instead of single
package installations with apt

Change-Id: Ifd268e7eca212fb5686b30213c1c7c1e47f5eb25
2020-01-17 16:03:03 -06:00
Phil Sphicas 788501e806 apt: chart update: allow conf.apt.packages as map
This change allows conf.apt.packages to be defined as a map of lists,
allowing for logical grouping and easier substitution when values.yaml
is being assembled from multiple sources.

The existing format (conf.apt.packages as a list) is still supported.

Change-Id: I4d4c09723b2e9ac1f0ecf847e786d991cc6e669a
2020-01-07 12:31:53 -08:00
Phil Sphicas 524c1b1e32 Fix airship-divingbell-ubuntu zuul gate
Fixes the airship-divingbell-ubuntu zuul gate.

Change-Id: I83642d43f4a4ae8a4882b120e965fcacd166700a
2020-01-07 12:31:53 -08:00
Zuul 010b5c6c03 Merge "apt: Add allow-downgrades option per package" 2019-10-17 18:26:23 +00:00
anthony.bellino d917166a73 apt: Add allow-downgrades option per package
This change adds the ability to include the --allow-downgrades
option per package install.

Change-Id: I2e0c6f11a51c1b78994e77084e3b2046c179d888
2019-10-17 03:11:19 +00:00
Evgeny L 9be717e860 Allow to configure service network policy
The patch introduces network policy configuration similar
to openstack-helm services. It allows users to configure
policies depending on the environment.

* Network policies are disabled by default.
* When enabled default policies allow all ingress and
  egress traffic (i.e. policy set to {}), this may be
  changed in future patch-sets.

Change-Id: I2adb5e652c1da0a1982ab18c498f033910a47cd8
2019-09-27 20:48:09 +00:00
Drew Walters 2e5ffaccca apt: Add full-system upgrade feature
Currently, the APT daemonset allows the installation of new packages or
upgrade of existing packages to a newer version. Sometimes, it may be
desirable to trigger an update for all packages. This change introduces
the ability to trigger a full-system upgrade using the .conf.apt.upgrade
chart value. The new option is disabled by default.

Change-Id: I611422c2093b9dbbae4e2d7cc05ebd726e895c88
Signed-off-by: Drew Walters <andrew.walters@att.com>
2019-08-21 16:07:54 +00:00
Roman Gorshunov 1504533fb1 Change DaemonSet apiVersion to apps/v1
DaemonSet apiVersion: extensions/v1beta1 is deprecated starting from
Kubernetes v1.8.0-alpha.3 [0].

DaemonSet uses apiVersion: apps/v1 starting from v1.9.0 [1].

We run Kubernetes v1.13.4 and up at the moment.

[0] -
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.8.md
[1] -
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md

Change-Id: Ic286e208836cf17be09fa78ba4d0f45084ae47fb
2019-08-01 20:25:43 +00:00
Zuul d3b1a5c985 Merge "Various gate fixes to make gate green" 2019-08-01 19:46:37 +00:00
Zuul f727f6adf1 Merge "Add release uuid annotation to POD spec" 2019-07-30 19:04:16 +00:00
Anderson, Craig (ca846m) c68a3ff61f Various gate fixes to make gate green
1. There is an ocassional timing issue when container logs are
   unavailabile at certain points in the crash loop at the same
   time the gate script tries to request them. The gate will now retry
   this operation, instead of terminating right away with failure.
2. Re-enable uamlite security context so that useradd operations would
   succeed.
3. Change apt pinning tests to use a version of the package that is
   available in the apt repo. Upstream repos change, so we should not
   pin to an explicit version that will be removed in the future and
   break the gate.
4. Update helm version to 2.14.1 to sync with openstack-helm-infra
5. Fix divingbell build script: git --depth=1 incompatible with explicit
   non-master commit checkout
6. Enhance overrides test case #7 to test for the issue identified in
   [0].
7. Change hostname scheduling to match minikube hostname now configured
   by OSH gate, instead of using the node's actual hostname
8. Re-enable gate voting

[0] https://storyboard.openstack.org/#!/story/2005936

Depends-On: https://review.opendev.org/671875/
Change-Id: Iad983ce363711e16ccd54e663c23d30a4a6a1177
2019-07-29 14:42:18 -07:00
Zuul 49fc3ccc7e Merge "Update uamlite.sh to handle empty user_sshkeys arrays" 2019-07-24 15:48:58 +00:00
Kumar, Nishant(nk613n) d5a65962fe Add release uuid annotation to POD spec
Change-Id: I6158af07b15dbc098ae4e67c949b00c293b30894
2019-07-24 14:50:25 +00:00
Matt McEuen ab6db0f11c Make apt container privileged
This makes the main container within the apt daemonset run as
privileged, which is required to perform kernel upgrades through it.
It was confirmed that even with all capabilities enabled, an
unprivileged apt is unable to perform the necessary updates to
the boot partition during a kernel upgrade.

Change-Id: I4e996794f24fcfc9d8ced7a58cecd2ceec36f6c5
2019-07-15 17:21:38 -05:00
Matt Carter 4c6ac4712d Update uamlite.sh to handle empty user_sshkeys arrays
Previously _uamlite.sh.tpl would fail to render if any user data
had an empty user_sshkeys array. This is because the template would
check to see if the key existed, but not actually make sure that the
array contained within that key had any elements. "first" would be
called against the empty array, which would return nil, and then
the outer eq function call would fail (as it can't be used to
compare nil values).

This patch set adds a default statement after the "first" function,
so that if the array is empty and first returns nil, a default of
"Unmanaged" will be returned, which will end up making the eq
statement evaluate to false, and the code inside the if statement to
not be run.

Change-Id: I52713795284cd1d0961bd430858061f9df9c5f78
2019-06-25 15:16:31 -05:00
Zuul 00cebd8b3b Merge "Use common logger for consistent log output" 2019-04-24 18:24:35 +00:00
Anderson, Craig (ca846m) 87268308f8 Use common logger for consistent log output
Use the common logger for consistent log output for some echo statements
that were not making use of it.

Change-Id: I7fae2a950318f5cd3245a4571dc464009726d4ae
2019-04-11 13:23:05 -07:00
Dmitrii Kabanov 2fb6a299a3 Add support of older versions of Helm
This PS allows to avoid of using assignments which are not supported
in older versions of Helm (GO<1.11).
Change-Id: Ic0dad4d1b60071c4366c63834f1ad7e3a76fdcd8
2019-04-11 12:14:20 -07:00
Dmitrii Kabanov 8f102a878a Add possibility to add repository and GPG key
The PS adds possibility to add repository and GPG key.

Change-Id: Ie4bfc3ba9501b8af484515e9d2946725bd9eff4b
2019-04-04 01:35:53 -07:00
Zuul b8f2792eb6 Merge "Run Divingbell containers as unprivileged" 2019-03-20 17:31:05 +00:00
Zuul 3bce1c1ac2 Merge "(perm) Fix CL for reverting missing file" 2019-03-19 19:12:14 +00:00
BARTRA, RICK 2c80c45fe8 Run Divingbell containers as unprivileged
Divingbell runs all its containers as privileged. Some Divingbell
containers can perform their jobs with the default set of Linux
capabilities that Docker gives to unprivileged containers while others
need additional capabilities. The default list of capabilties include
the following:
  - SETPCAP
  - MKNOD
  - AUDIT_WRITE
  - CHOWN
  - NET_RAW
  - DAC_OVERRIDE
  - FOWNER
  - FSETID
  - KILL
  - SETGID
  - SETUID
  - NET_BIND_SERVICE
  - SYS_CHROOT
  - SETFCAP

The capabilities listed in the daemonset templates function as a
whitelist in that the corresponding containers have access to the Linux
capabilities listed in their SecurityContext, but also the
aforementioned capabilties included by default by Docker.

Summary of testing for each daemonset:

The bcc-capable tool [0] was used to discover which Linux capabilities
the Divingbell containers invoke. The tool was ran against all the
processes running in the container. The Divingbell logs for each
container were also carefully analyzed for failed permission checks.

daemonset-exec:
A recent change to use nsenter to enter all host namespaces when running
exec prevents divingbell-exec from being able to run unprivileged as
there are no Linux capabilties that allows write access to '/proc'.
When trying to run as unprivileged, the following prevents the pod from
coming up:
"nsenter: cannot open /proc/1/ns/ipc: Permission denied"

daemonset-sysctl:
Ran the divingbell-sys containers as unprivileged and the kernel config
on the host updated as defined in the manifest. Kernel configs were
checked before and after running divingbell-sys container as
unprivileged. Beyond the default Linux capabilties given by
Docker, the 'SYS_PTRACE', 'SYS_ADMIN', and 'SYS_RAWIO' Linux
capabilities are needed. The following is a snippet of the logs showing
under which circumstance these privileges are needed:

"INFO * Applying /etc/sysctl.d/10-kernel-hardening.conf ...
INFO sysctl: setting key "kernel.kptr_restrict": Operation not permitted

INFO * Applying /etc/sysctl.d/10-ptrace.conf ...
INFO sysctl: setting key "kernel.yama.ptrace_scope": Operation not
permitted

INFO * Applying /etc/sysctl.d/10-zeropage.conf ...
INFO sysctl: setting key "vm.mmap_min_addr": Operation not permitted"

daemonset-perm:
Ran the divingbell-perm containers as unprivileged and the file
ownership and permissions on the host updated as defined in the
manifest. As a test, the daemon was configured to run every minute
and the targeted files ownership and permissions were manually
changed. It was then verified that divingbell restored the ownership
and permissions of the file to what it should be. This applies to
the divingbell-perm-default and the divingbell-perm-calico containers.

daemonset-limits:
Ran the divingbell-limits containers as unprivileged and checked the
ulimits on the host before and after running divingbell and the ulimit
updated to the value defined in the manifest. The capable tool also
showed that no additional Linux capabilties are needed.

daemonset-apparmor:
Ran the divingbell-apparmor containers as unprivileged and logs show no
evidence of failed permission checks. Additionally, the apparmor config
was updated in the manifest and the apparmor profile successfully
loaded. Beyond the default Linux capabilties given by Docker, the
'MAC_ADMIN' Linux capability is needed to load an apparmor profile.

daemonset-apt:
Ran the divingbell-apt containers as unprivileged and was able to
successfully install package without issues. As a test, the
manifest was updated to install 'htop' and after running Divingbell,
it was confirmed that 'htop' installed successfully. Here is
a snippet from the logs:
DEBUG + INSTALLED_THIS_TIME=' htop'
DEBUG + REQUESTED_PACKAGES=' htop'

daemonset-ethtool:
Ran the divingbell-ethtool containers as unprivileged and was able to
manage NIC tunables. As a check, the NIC tunables for ens3 was checked
before and after running Divingbell - 'ethtool -k ens3'. Divingbell
configured the NIC as defined in the manifest. Beyond the default Linux
capabilties given by Docker, the 'NET_ADMIN' Linux capability is needed.
The following is a log snippet showing what happens when the 'NET_ADMIN'
capability is not added:
"DEBUG + /sbin/ethtool -K cali86cb821b7db tx-nocache-copy off
INFO Cannot set device feature settings: Operation not permitted"

daemonset-uamlite:
Ran the divingbell-uamlite containers as unprivileged and was able to
successfully add user accounts as defined in the manifest. No additional
Linux capabilities are needed.

daemonset_mounts:
Ran the divingbell-mounts containers as unprivileged and was able to
successfully add host level mounts as defined in the manifest. No
additional Linux capabilities are needed.

[0]https://github.com/iovisor/bcc/blob/master/tools/capable.py

Change-Id: I26a1b5e06ad27c854d95e6675de05b884ce3bdc1
2019-03-15 19:51:24 +00:00
Pete Birley 85534b7796 Exec: Use nsenter to enter all host namespaces when running exec
This PS moves to pivot to the hosts namespaces rather than chroot
so as to allow scripts to run fully in the context of the host.

Change-Id: I6b4dab92b6f8a7f9fa5b895d546117fdae43d731
Signed-off-by: Pete Birley <pete@port.direct>
2019-03-11 19:32:48 -07:00
Scott Hussey 9d244c4443 (perm) Fix CL for reverting missing file
- When reverting permissions on a file, there is no check for existence
  causing a deleted file to CL the perm module

Change-Id: Ifae0ac196acf8ac2ccef84102967b6b4305a7691
2019-03-08 09:09:27 -06:00
Rahul Khiyani 87dbc54044 Adding timestamp to _shcommon as log formatter for
troubleshooting

Change-Id: Ie89fc95e5d7f0e4f832bac45f87915893ed79942
2019-01-16 07:43:22 -05:00
anthony.bellino f4c8228ff6 Add rerun support for perm module
- Adds the ability to rerun divingbell-perm at specified interval.

- Adds the ability to specify a rerun policy of
  'always', 'never', 'once_successfully'. Default value is 'always'.

Demo: https://asciinema.org/a/220289

Change-Id: I3909b4d92f8e2bdb0d826ca1cfbd62f937c2532d
2019-01-10 17:39:32 +00:00
Nikita Koshikov 606cf35bda Add new apparmor daemonset
Implemented daemonset that will manage host apparmor profiles.
Tests and documentation added.

demo: https://asciinema.org/a/uQjlWgC4bjI3WkfontmThf8t0

Co-Authored-By: Vladyslav Drok <vdrok@mirantis.com>
Change-Id: I13f7357c15b5c4386a61bba50f097eb434d7f211
2018-12-14 19:02:00 -08:00
Craig Anderson 4ed467e512 Add retry/rerun support for exec module
Add support for retries and reruns at specified intervals for
divingbell-exec scripts. Also adds support for timeouts.

Also update osh-infra-upgrade-host to allow gate to run.

Change-Id: I5f4cd43b13a467d94f67b358f3190f515256ae66
2018-12-14 19:45:38 +00:00
Craig Anderson 012800d854 Add new divingbell-exec module
Stopgap module to provide generic node exec capability until shift
to [0] and [1].

[0] https://github.com/GoogleCloudPlatform/metacontroller
[1] https://github.com/argoproj/argo

Change-Id: I278548e1e09ed31dcc4212142f1e6465ee8d9792
2018-12-04 18:22:51 +00:00