Rook is a Cloud Native Storage solution, it creates CRDs which in turn create their corresponding storage pods and resources.

Install Rook CRD

Install Operator via helm chart. This is the foundation of all fun.

helm repo add rook-release https://charts.rook.io/release
helm install --namespace rook-ceph rook-release/rook-ceph -n rook

Note: Rook Operator and CRD cluster must be in the same namespace, because CRD will use helm created serviceaccount to create all resources.

Ceph Block Storage

Assume we have 10 computes which all have disk sdb-sdf(hdd), and sdg is ssd. Then following yml will create a ceph cluster with 3 mons and all disks(not empty) on all computes, here we have hostNetwork: true this will work with configmap to force osds to use other dedicated storage network:

  • rook-config-override configmap forces pod to use whatever if finds in that subnet(IP inherit from host), need to kill all osd pod after change this configmap to force it refresh.
apiVersion: v1
data:
  config: |
    [global]
    public network =  10.240.101.0/24
    cluster network = 10.240.103.0/24
    public addr = ""
    cluster addr = ""
  • CephCluster CRD defination:
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  cephVersion:
    image: ceph/ceph:v14.2.4-20190917
  dataDirHostPath: /var/lib/rook
  mon:
    count: 3
    allowMultiplePerNode: false
  dashboard:
    enabled: true
  network:
    hostNetwork: true
  # cluster level storage configuration and selection
  storage:
    useAllNodes: true
    useAllDevices: true
    deviceFilter:
    location:
    config:
      metadataDevice: "sdg"


After apply, if everything’s right, you’ll see bunch of osd spawned and no errors in CephCluster CRD, wait until CephCluster instance shows created otherwise it will have weird issues.

  • 3 mons can’t ceph -s right out of box, a dedicated tool box needs to be installed for troubleshooting or mgmt purpose.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rook-ceph-tools
  namespace: rook-ceph
  labels:
    app: rook-ceph-tools
spec:
  replicas: 1
  selector:
    matchLabels:
      app: rook-ceph-tools
  template:
    metadata:
      labels:
        app: rook-ceph-tools
    spec:
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: rook-ceph-tools
        image: rook/ceph:master
        command: ["/tini"]
        args: ["-g", "--", "/usr/local/bin/toolbox.sh"]
        imagePullPolicy: IfNotPresent
        env:
          - name: ROOK_ADMIN_SECRET
            valueFrom:
              secretKeyRef:
                name: rook-ceph-mon
                key: admin-secret
        securityContext:
          privileged: true
        volumeMounts:
          - mountPath: /dev
            name: dev
          - mountPath: /sys/bus
            name: sysbus
          - mountPath: /lib/modules
            name: libmodules
          - name: mon-endpoint-volume
            mountPath: /etc/rook
      # if hostNetwork: false, the "rbd map" command hangs, see https://github.com/rook/rook/issues/2021
      hostNetwork: true
      volumes:
        - name: dev
          hostPath:
            path: /dev
        - name: sysbus
          hostPath:
            path: /sys/bus
        - name: libmodules
          hostPath:
            path: /lib/modules
        - name: mon-endpoint-volume
          configMap:
            name: rook-ceph-mon-endpoints
            items:
            - key: data
              path: mon-endpoints

Delete/add new node

Let’s use k8s node upward-crow as an example.

ceph osd tree
 -9        1.17560     host upward-crow                           
 15   hdd  0.29390         osd.15             up  1.00000 1.00000 
 16   hdd  0.29390         osd.16             up  1.00000 1.00000 
 18   hdd  0.29390         osd.18             up  1.00000 1.00000 
 19   hdd  0.29390         osd.19             up  1.00000 1.00000 
 17   hdd  0.29390         osd.17             up  1.00000 1.00000

Preparation:

  1. Make sure remove this node won’t make ceph cluster full and refuse write new data.
ceph df
rados df
ceph osd df
  1. Disable scrubbing to make sure current disk actions won’t be affected by:
ceph osd set noscrub
ceph osd set nodeep-scrub
  1. Limit Backfill and Recovery to help maintain the performance I/O during recovering:
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_op_priority = 1

Remove node:

  1. Remove osd on that node one by one, and wait until ceph recover itself then delete one another.
# ceph osd out <osd_id>
ceph osd out osd.15 osd.16 17 18 19 
ceph -w
  1. Remove osd from crush map so that it no longer receives data:
ceph osd crush remove osd.15
ceph osd crush remove osd.16
ceph osd crush remove osd.17
ceph osd crush remove osd.18
ceph osd crush remove osd.19
  1. Remove the OSD authentication key:
# ceph auth del osd.<osd_id>
ceph auth del osd.15
ceph auth del osd.16
ceph auth del osd.17 
ceph auth del osd.18 
ceph auth del osd.19 
  1. Remove all osd, this includes delete all osd deployments on k8s:
# ceph osd rm <osd_id>
ceph osd rm 15 16 17 18 19
  1. Remove the crush node map:
ceph osd crush rm upward-crow  

Delete/add new osd disk

This will use osd.5 as an example.
ceph commands are expected to be run in the rook-toolbox:

  1. disk fails
  2. remove disk from node
  3. mark out osd. ceph osd out osd.5
  4. remove from crush map. ceph osd crush remove osd.5
  5. delete caps. ceph auth del osd.5
  6. remove osd. ceph osd rm osd.5
  7. delete the deployment kubectl delete deployment -n rook-ceph rook-ceph-osd-id-5
  8. delete osd data dir on node rm -rf /var/lib/rook/osd5
  9. edit the osd configmap kubectl edit configmap -n rook-ceph rook-ceph-osd-nodename-config, remove config section pertaining to your osd id and underlying device.
  10. add new disk and verify node sees it.
  11. restart the rook-operator pod by deleting the rook-operator pod
  12. osd prepare pods run
  13. new rook-ceph-osd-id-5 will be created
  14. check health of your cluster ceph -s; ceph osd tree

Example to troubleshoot with Toolbox

Check running status of Ceph Cluster:

ceph -s
ceph osd tree

Create a volume image (10MB):

rbd create replicapool/test --size 10
rbd info replicapool/test

# Disable the rbd features that are not in the kernel module
rbd feature disable replicapool/test fast-diff deep-flatten object-map

Map the block volume and format it and mount it:

# Map the rbd device. If the toolbox was started with "hostNetwork: false" this hangs and you have to stop it with Ctrl-C,
# however the command still succeeds; see https://github.com/rook/rook/issues/2021
rbd map replicapool/test

# Find the device name, such as rbd0
lsblk | grep rbd

# Format the volume (only do this the first time or you will lose data)
mkfs.ext4 -m0 /dev/rbd0

# Mount the block device
mkdir /tmp/rook-volume
mount /dev/rbd0 /tmp/rook-volume

Unmount the volume and unmap the kernel device:

umount /tmp/rook-volume
rbd unmap /dev/rbd0

Shared Filesystem

mount cephfs

# Create the directory
mkdir /tmp/registry

# Detect the mon endpoints and the user secret for the connection
mon_endpoints=$(grep mon_host /etc/ceph/ceph.conf | awk '{print $3}')
my_secret=$(grep key /etc/ceph/keyring | awk '{print $3}')

# Mount the file system
mount -t ceph -o mds_namespace=myfs,name=admin,secret=$my_secret $mon_endpoints:/ /tmp/registry

# See your mounted file system
df -h

Test Ceph performance

Rados

rados can test ceph cluster performance

  1. Create test pool and drop all cache data
ceph osd pool create testbench 100 100
sudo echo 3 | sudo tee /proc/sys/vm/drop_caches && sudo sync
  1. Execute a write test for 10 seconds to the newly created storage pool:
rados bench -p testbench 10 write --no-cleanup
  1. Execute a sequential read test for 10 seconds to the storage pool:
rados bench -p testbench 10 seq
  1. Execute a random read test for 10 seconds to the storage pool:
rados bench -p testbench 10 rand

Creating a Ceph Block Device

To test actual performance on a block device, use rbd bench-write.

  1. Load the rbd kernel module, if not already loaded:
sudo modprobe rbd
  1. Create a 1 GB rbd image file in the testbench pool:
sudo rbd create image01 --size 1024 --pool testbench
  1. Map the image file to a device file:
sudo rbd map image01 --pool testbench --name client.admin
  1. Create an ext4 file system on the block device:
sudo mkfs.ext4 -m0 /dev/rbd0
  1. Create a new directory:
sudo mkdir /tmp/rook-volume
  1. Mount the block device under /mnt/ceph-block-device/:
sudo mount /dev/rbd0 /tmp/rook-volume
  1. Execute the write performance test against the block device:
rbd bench-write image01 --pool=testbench