The divingbell pods use a hostPath volume for the root filesystem.
Because this mount includes /var/lib/kubelet, the pod holds a reference
to every volume mounted by every pod on the same host.
The most visible case where this causes a problem is the termination of
a pod that uses a ceph-backed PVCs. When kubelet tries to unmap the rbd
device, it is unable to do so, manifesting in the kubelet logs as:
rbd: unmap failed: (16) Device or resource busy
This change sets the mountPropagation to HostToContainer for the rootfs
volume, so that the divingbell pods will not prevent kubelet from
releasing these devices.
https://kubernetes.io/docs/concepts/storage/volumes/#mount-propagation
Change-Id: I6e91fb9b9d7cbe852c5e6dc8b7224d6085175590
This change adds the ability to configure node selectors per module. The
default node selector is 'kubernetes.io/os=linux'. For example:
labels:
apt:
node_selector_key=divingbell-apt
node_selector_value=enabled
Will result in a node selector of 'divingbell-apt=enabled'.
Change-Id: I7150c5f998afa30dce22f505be4d0d164254214f
This adds default AppArmor profile to divingbell.
Also, update to gate script to install ethtool if it is not present.
Change-Id: I7abb13a533b596f4db5fe65fdae5eb7fc57ec00a
Divingbell runs all its containers as privileged. Some Divingbell
containers can perform their jobs with the default set of Linux
capabilities that Docker gives to unprivileged containers while others
need additional capabilities. The default list of capabilties include
the following:
- SETPCAP
- MKNOD
- AUDIT_WRITE
- CHOWN
- NET_RAW
- DAC_OVERRIDE
- FOWNER
- FSETID
- KILL
- SETGID
- SETUID
- NET_BIND_SERVICE
- SYS_CHROOT
- SETFCAP
The capabilities listed in the daemonset templates function as a
whitelist in that the corresponding containers have access to the Linux
capabilities listed in their SecurityContext, but also the
aforementioned capabilties included by default by Docker.
Summary of testing for each daemonset:
The bcc-capable tool [0] was used to discover which Linux capabilities
the Divingbell containers invoke. The tool was ran against all the
processes running in the container. The Divingbell logs for each
container were also carefully analyzed for failed permission checks.
daemonset-exec:
A recent change to use nsenter to enter all host namespaces when running
exec prevents divingbell-exec from being able to run unprivileged as
there are no Linux capabilties that allows write access to '/proc'.
When trying to run as unprivileged, the following prevents the pod from
coming up:
"nsenter: cannot open /proc/1/ns/ipc: Permission denied"
daemonset-sysctl:
Ran the divingbell-sys containers as unprivileged and the kernel config
on the host updated as defined in the manifest. Kernel configs were
checked before and after running divingbell-sys container as
unprivileged. Beyond the default Linux capabilties given by
Docker, the 'SYS_PTRACE', 'SYS_ADMIN', and 'SYS_RAWIO' Linux
capabilities are needed. The following is a snippet of the logs showing
under which circumstance these privileges are needed:
"INFO * Applying /etc/sysctl.d/10-kernel-hardening.conf ...
INFO sysctl: setting key "kernel.kptr_restrict": Operation not permitted
INFO * Applying /etc/sysctl.d/10-ptrace.conf ...
INFO sysctl: setting key "kernel.yama.ptrace_scope": Operation not
permitted
INFO * Applying /etc/sysctl.d/10-zeropage.conf ...
INFO sysctl: setting key "vm.mmap_min_addr": Operation not permitted"
daemonset-perm:
Ran the divingbell-perm containers as unprivileged and the file
ownership and permissions on the host updated as defined in the
manifest. As a test, the daemon was configured to run every minute
and the targeted files ownership and permissions were manually
changed. It was then verified that divingbell restored the ownership
and permissions of the file to what it should be. This applies to
the divingbell-perm-default and the divingbell-perm-calico containers.
daemonset-limits:
Ran the divingbell-limits containers as unprivileged and checked the
ulimits on the host before and after running divingbell and the ulimit
updated to the value defined in the manifest. The capable tool also
showed that no additional Linux capabilties are needed.
daemonset-apparmor:
Ran the divingbell-apparmor containers as unprivileged and logs show no
evidence of failed permission checks. Additionally, the apparmor config
was updated in the manifest and the apparmor profile successfully
loaded. Beyond the default Linux capabilties given by Docker, the
'MAC_ADMIN' Linux capability is needed to load an apparmor profile.
daemonset-apt:
Ran the divingbell-apt containers as unprivileged and was able to
successfully install package without issues. As a test, the
manifest was updated to install 'htop' and after running Divingbell,
it was confirmed that 'htop' installed successfully. Here is
a snippet from the logs:
DEBUG + INSTALLED_THIS_TIME=' htop'
DEBUG + REQUESTED_PACKAGES=' htop'
daemonset-ethtool:
Ran the divingbell-ethtool containers as unprivileged and was able to
manage NIC tunables. As a check, the NIC tunables for ens3 was checked
before and after running Divingbell - 'ethtool -k ens3'. Divingbell
configured the NIC as defined in the manifest. Beyond the default Linux
capabilties given by Docker, the 'NET_ADMIN' Linux capability is needed.
The following is a log snippet showing what happens when the 'NET_ADMIN'
capability is not added:
"DEBUG + /sbin/ethtool -K cali86cb821b7db tx-nocache-copy off
INFO Cannot set device feature settings: Operation not permitted"
daemonset-uamlite:
Ran the divingbell-uamlite containers as unprivileged and was able to
successfully add user accounts as defined in the manifest. No additional
Linux capabilities are needed.
daemonset_mounts:
Ran the divingbell-mounts containers as unprivileged and was able to
successfully add host level mounts as defined in the manifest. No
additional Linux capabilities are needed.
[0]https://github.com/iovisor/bcc/blob/master/tools/capable.py
Change-Id: I26a1b5e06ad27c854d95e6675de05b884ce3bdc1
Change configmaps to secrets to maintain compatibility with [0].
[0] https://review.openstack.org/#/c/617039
Change-Id: Ie95aee1a4104008ca93c23ac9d19245a87fade20
This PS adds the ability to attach a release uuid to pods and rc
objects as desired. This can be used, for example, to force an
artificial manifest change in CICD scenarios, for upgradability
testing purposes.
Change-Id: I2f5279c6983f43288e4ef3cb48898d5a36b33833