summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorEvgeny L <eli@mirantis.com>2019-02-08 21:26:06 +0000
committerEvgeny L <eli@mirantis.com>2019-02-15 19:55:46 +0000
commitba1dd3681a9b02845b66322da005a798d3f69656 (patch)
tree8b7cd40ffcac521655072acd119691ce6aabdb89
parent8ad22004ae0b603e972ea1de8d50d3460e38f9fa (diff)
Update documentation on Ceph partitioning
Make the docs up to date: * Previous version of the documentation assumes that partitioning schema is different for SSDs and HDDs, this is not the case anymore. * Ceph charts now have automatic partitioning for both OSDs and journals. Change-Id: I74bd625522469e2860ada995f4e6a81a566107fa
Notes
Notes (review): Code-Review+2: Kaspars Skels <kaspars.skels@gmail.com> Code-Review+2: Roman Gorshunov <roman.gorshunov@att.com> Workflow+1: Roman Gorshunov <roman.gorshunov@att.com> Verified+2: Zuul Submitted-by: Zuul Submitted-at: Mon, 18 Feb 2019 15:33:21 +0000 Reviewed-on: https://review.openstack.org/635939 Project: openstack/airship-treasuremap Branch: refs/heads/master
-rw-r--r--doc/source/authoring_and_deployment.rst200
1 files changed, 28 insertions, 172 deletions
diff --git a/doc/source/authoring_and_deployment.rst b/doc/source/authoring_and_deployment.rst
index b3be834..86ad0e5 100644
--- a/doc/source/authoring_and_deployment.rst
+++ b/doc/source/authoring_and_deployment.rst
@@ -240,26 +240,27 @@ the order in which you should build your site files is as follows:
240Control Plane Ceph Cluster Notes 240Control Plane Ceph Cluster Notes
241~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 241~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
242 242
243Environment Ceph parameters for the control plane are located in: 243Configuration variables for ceph control plane are located in:
244 244
245``site/${NEW_SITE}/software/charts/ucp/ceph/ceph.yaml`` 245- ``site/${NEW_SITE}/software/charts/ucp/ceph/ceph-osd.yaml``
246- ``site/${NEW_SITE}/software/charts/ucp/ceph/ceph-client.yaml``
246 247
247Setting highlights: 248Setting highlights:
248 249
249- data/values/conf/storage/osd[\*]/data/location: The block device that 250- data/values/conf/storage/osd[\*]/data/location: The block device that
250 will be formatted by the Ceph chart and used as a Ceph OSD disk 251 will be formatted by the Ceph chart and used as a Ceph OSD disk
251- data/values/conf/storage/osd[\*]/journal/location: The directory 252- data/values/conf/storage/osd[\*]/journal/location: The block device
252 backing the ceph journal used by this OSD. Refer to the journal 253 backing the ceph journal used by this OSD. Refer to the journal
253 paradigm below. 254 paradigm below.
254- data/values/conf/pool/target/osd: Number of OSD disks on each node 255- data/values/conf/pool/target/osd: Number of OSD disks on each node
255 256
256Assumptions: 257Assumptions:
257 258
2581. Ceph OSD disks are not configured for any type of RAID (i.e., they 2591. Ceph OSD disks are not configured for any type of RAID, they
259 are configured as JBOD if connected through a RAID controller). (If 260 are configured as JBOD when connected through a RAID controller.
260 RAID controller does not support JBOD, put each disk in its own 261 If RAID controller does not support JBOD, put each disk in its
261 RAID-0 and enable RAID cache and write-back cache if the RAID 262 own RAID-0 and enable RAID cache and write-back cache if the
262 controller supports it.) 263 RAID controller supports it.
2632. Ceph disk mapping, disk layout, journal and OSD setup is the same 2642. Ceph disk mapping, disk layout, journal and OSD setup is the same
264 across Ceph nodes, with only their role differing. Out of the 4 265 across Ceph nodes, with only their role differing. Out of the 4
265 control plane nodes, we expect to have 3 actively participating in 266 control plane nodes, we expect to have 3 actively participating in
@@ -268,16 +269,12 @@ Assumptions:
268 (cp\_*-secondary) than the other three (cp\_*-primary). 269 (cp\_*-secondary) than the other three (cp\_*-primary).
2693. If doing a fresh install, disk are unlabeled or not labeled from a 2703. If doing a fresh install, disk are unlabeled or not labeled from a
270 previous Ceph install, so that Ceph chart will not fail disk 271 previous Ceph install, so that Ceph chart will not fail disk
271 initialization 272 initialization.
272 273
273This document covers two Ceph journal deployment paradigms: 274It's highly recommended to use SSD devices for Ceph Journal partitions.
274
2751. Servers with SSD/HDD mix (disregarding operating system disks).
2762. Servers with no SSDs (disregarding operating system disks). In other
277 words, exclusively spinning disk HDDs available for Ceph.
278 275
279If you have an operating system available on the target hardware, you 276If you have an operating system available on the target hardware, you
280can determine HDD and SSD layout with: 277can determine HDD and SSD devices with:
281 278
282:: 279::
283 280
@@ -288,28 +285,23 @@ and where a value of ``0`` indicates non-spinning disk (i.e. SSD). (Note
288- Some SSDs still report a value of ``1``, so it is best to go by your 285- Some SSDs still report a value of ``1``, so it is best to go by your
289server specifications). 286server specifications).
290 287
291In case #1, the SSDs will be used for journals and the HDDs for OSDs.
292
293For OSDs, pass in the whole block device (e.g., ``/dev/sdd``), and the 288For OSDs, pass in the whole block device (e.g., ``/dev/sdd``), and the
294Ceph chart will take care of disk partitioning, formatting, mounting, 289Ceph chart will take care of disk partitioning, formatting, mounting,
295etc. 290etc.
296 291
297For journals, divide the number of journal disks as evenly as possible 292For Ceph Journals, you can pass in a specific partition (e.g., ``/dev/sdb1``),
298between the OSD disks. We will also use the whole block device, however 293note that it's not required to pre-create these partitions, Ceph chart
299we cannot pass that block device to the Ceph chart like we can for the 294will create journal partitions automatically if they don't exist.
300OSD disks. 295By default the size of every journal partition is 10G, make sure
301 296there is enough space available to allocate all journal partitions.
302Instead, the journal devices must be already partitioned, formatted, and
303mounted prior to Ceph chart execution. This should be done by MaaS as
304part of the Drydock host-profile being used for control plane nodes.
305 297
306Consider the follow example where: 298Consider the following example where:
307 299
308- /dev/sda is an operating system RAID-1 device (SSDs for OS root) 300- /dev/sda is an operating system RAID-1 device (SSDs for OS root)
309- /dev/sdb is an operating system RAID-1 device (SSDs for ceph journal) 301- /dev/sd[bc] are SSDs for ceph journals
310- /dev/sd[cdef] are HDDs 302- /dev/sd[efgh] are HDDs for OSDs
311 303
312Then, the data section of this file would look like: 304The data section of this file would look like:
313 305
314:: 306::
315 307
@@ -320,96 +312,29 @@ Then, the data section of this file would look like:
320 osd: 312 osd:
321 - data: 313 - data:
322 type: block-logical 314 type: block-logical
323 location: /dev/sdd
324 journal:
325 type: directory
326 location: /var/lib/openstack-helm/ceph/journal/journal-sdd
327 - data:
328 type: block-logical
329 location: /dev/sde 315 location: /dev/sde
330 journal: 316 journal:
331 type: directory
332 location: /var/lib/openstack-helm/ceph/journal/journal-sde
333 - data:
334 type: block-logical 317 type: block-logical
335 location: /dev/sdf 318 location: /dev/sdb1
336 journal:
337 type: directory
338 location: /var/lib/openstack-helm/ceph/journal/journal-sdf
339 - data: 319 - data:
340 type: block-logical 320 type: block-logical
341 location: /dev/sdg 321 location: /dev/sdf
342 journal: 322 journal:
343 type: directory
344 location: /var/lib/openstack-helm/ceph/journal/journal-sdg
345 pool:
346 target:
347 osd: 4
348
349where the following mount is setup by MaaS via Drydock host profile for
350the control-plane nodes:
351
352::
353
354 /dev/sdb is mounted to /var/lib/openstack-helm/ceph/journal
355
356In case #2, Ceph best practice is to allocate journal space on all OSD
357disks. The Ceph chart assumes this partitioning has been done
358beforehand. Ensure that your control plane host profile is partitioning
359each disk between the Ceph OSD and Ceph journal, and that it is mounting
360the journal partitions. (Drydock will drive these disk layouts via MaaS
361provisioning). Note the mountpoints for the journals and the partition
362mappings. Consider the following example where:
363
364- /dev/sda is the operating system RAID-1 device
365- /dev/sd[bcde] are HDDs
366
367Then, the data section of this file will look similar to the following:
368
369::
370
371 data:
372 values:
373 conf:
374 storage:
375 osd:
376 - data:
377 type: block-logical 323 type: block-logical
378 location: /dev/sdb2 324 location: /dev/sdb2
379 journal:
380 type: directory
381 location: /var/lib/openstack-helm/ceph/journal0/journal-sdb
382 - data: 325 - data:
383 type: block-logical 326 type: block-logical
384 location: /dev/sdc2 327 location: /dev/sdg
385 journal: 328 journal:
386 type: directory
387 location: /var/lib/openstack-helm/ceph/journal1/journal-sdc
388 - data:
389 type: block-logical 329 type: block-logical
390 location: /dev/sdd2 330 location: /dev/sdc1
391 journal:
392 type: directory
393 location: /var/lib/openstack-helm/ceph/journal2/journal-sdd
394 - data: 331 - data:
395 type: block-logical 332 type: block-logical
396 location: /dev/sde2 333 location: /dev/sdh
397 journal: 334 journal:
398 type: directory 335 type: block-logical
399 location: /var/lib/openstack-helm/ceph/journal3/journal-sde 336 location: /dev/sdc2
400 pool:
401 target:
402 osd: 4
403
404where the following mounts are setup by MaaS via Drydock host profile
405for the control-plane nodes:
406
407::
408 337
409 /dev/sdb1 is mounted to /var/lib/openstack-helm/ceph/journal0
410 /dev/sdc1 is mounted to /var/lib/openstack-helm/ceph/journal1
411 /dev/sdd1 is mounted to /var/lib/openstack-helm/ceph/journal2
412 /dev/sde1 is mounted to /var/lib/openstack-helm/ceph/journal3
413 338
414Update Passphrases 339Update Passphrases
415~~~~~~~~~~~~~~~~~~~~ 340~~~~~~~~~~~~~~~~~~~~
@@ -685,75 +610,6 @@ permission denied errors from apparmor when the MaaS container tries to
685leverage libc6 for /bin/sh when MaaS container ntpd is forcefully 610leverage libc6 for /bin/sh when MaaS container ntpd is forcefully
686disabled. 611disabled.
687 612
688Setup Ceph Journals
689~~~~~~~~~~~~~~~~~~~
690
691Until genesis node reprovisioning is implemented, it is necessary to
692manually perform host-level disk partitioning and mounting on the
693genesis node, for activites that would otherwise have been addressed by
694a bare metal node provision via Drydock host profile data by MaaS.
695
696Assuming your genesis HW matches the HW used in your control plane host
697profile, you should manually apply to the genesis node the same Ceph
698partitioning (OSDs & journals) and formatting + mounting (journals only)
699as defined in the control plane host profile. See
700``airship-treasuremap/global/profiles/host/base_control_plane.yaml``.
701
702For example, if we have a journal SSDs ``/dev/sdb`` on the genesis node,
703then use the ``cfdisk`` tool to format it:
704
705::
706
707 sudo cfdisk /dev/sdb
708
709Then:
710
7111. Select ``gpt`` label for the disk
7122. Select ``New`` to create a new partition
7133. If scenario #1 applies in
714 site/$NEW\_SITE/software/charts/ucp/ceph/ceph.yaml\_, then accept
715 default partition size (entire disk). If scenario #2 applies, then
716 only allocate as much space as defined in the journal disk partitions
717 mounted in the control plane host profile.
7184. Select ``Write`` option to commit changes, then ``Quit``
7195. If scenario #2 applies, create a second partition that takes up all
720 of the remaining disk space. This will be used as the OSD partition
721 (``/dev/sdb2``).
722
723Install package to format disks with XFS:
724
725::
726
727 sudo apt -y install xfsprogs
728
729Then, construct an XFS filesystem on the journal partition with XFS:
730
731::
732
733 sudo mkfs.xfs /dev/sdb1
734
735Create a directory as mount point for ``/dev/sdb1`` to match those
736defined in the same host profile ceph journals:
737
738::
739
740 sudo mkdir -p /var/lib/ceph/cp
741
742Use the ``blkid`` command to get the UUID for ``/dev/sdb1``, then
743populate ``/etc/fstab`` accordingly. Ex:
744
745::
746
747 sudo sh -c 'echo "UUID=01234567-ffff-aaaa-bbbb-abcdef012345 /var/lib/ceph/cp xfs defaults 0 0" >> /etc/fstab'
748
749Repeat all preceeding steps in this section for each journal device in
750the Ceph cluster. After this is completed for all journals, mount the
751partitions:
752
753::
754
755 sudo mount -a
756
757Promenade bootstrap 613Promenade bootstrap
758~~~~~~~~~~~~~~~~~~~ 614~~~~~~~~~~~~~~~~~~~
759 615