Build Ceph and Kubernetes based distributed file storage system

In this article, we are going to build a Ceph and Kubernetes based distributed file storage system and integrate into our java platform. <image>

Install

Download rook

Download the rook ceph GitHub code.

git clone --single-branch --branch release-1.2 https://github.com/rook/rook.git

Copy yaml

Copy the common.yaml, operator.yaml and toolbox.yaml files from ./rook/cluster/examples/kubernetes/ceph/.

cp ../rook/cluster/examples/kubernetes/ceph/common.yaml ../rook/cluster/examples/kubernetes/ceph/operator.yaml ../rook/cluster/examples/kubernetes/ceph/toolbox.yaml .

Create rook containers

Create the rook ceph containers, and wait the operator to be running status.

kubectl apply -f common.yaml
kubectl apply -f operator.yaml
kubectl get pod -n rook-ceph

Create volumes

Create 3 20G volumes and attach to 3 linux instances. Format the disks. Mkdir /mnt/ceph-storage and mount them all.

sudo fdisk -l
sudo mkfs.ext4 /dev/vdc
sudo mount /dev/vdb /mnt/ceph-storage -t auto
df -h

Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        30G   19G   11G  65% /
devtmpfs        2.9G     0  2.9G   0% /dev
tmpfs           2.9G   12K  2.9G   1% /dev/shm
tmpfs           2.9G  298M  2.6G  11% /run
tmpfs           2.9G     0  2.9G   0% /sys/fs/cgroup
tmpfs           581M     0  581M   0% /run/user/1000
/dev/vdb         20G   45M   19G   1% /mnt/ceph-storage

Create Ceph cluster

Create cluster by the cluster.yaml. Notice the selected nodes have the disks attached. Each nodes/name field should match their kubernetes.io/hostname label. Notice the dashboard should be disabled first.

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  dataDirHostPath: /var/lib/rook
  mon:
    count: 3
  cephVersion:
    image: ceph/ceph:v14.2.4-20190917
    allowUnsupported: false
  dashboard:
    enabled: false
  network:
    hostNetwork: false
  storage:
    useAllNodes: false
    useAllDevices: false
    config:
      metadataDevice:
      databaseSizeMB: "1024" # this value can be removed for environments with normal sized disks (100 GB or larger)
      journalSizeMB: "1024"  # this value can be removed for environments with normal sized disks (20 GB or larger)
    nodes:
      - name: "slave.novalocal"
        directories:         # specific directories to use for storage can be specified for each node
          - path: "/mnt/ceph-storage"
      - name: "static.novalocal"
        directories:         # specific directories to use for storage can be specified for each node
          - path: "/mnt/ceph-storage"
      - name: "db.novalocal"
        directories:         # specific directories to use for storage can be specified for each node
          - path: "/mnt/ceph-storage"

kubectl apply -f cluster.yaml
kubectl get pod -n rook-ceph --watch

Re-create operator and check osd

If error happens, remember to delete the folder dataDirHostPath on all nodes and delete the operator, then try again to create the cluster. Attach the operator pod to see the logs. When 3 mon and 3 osd pods are running it means the cluster is successfully created.

sudo rm -rf /var/lib/rook
kubectl delete -f operator.yaml
kubectl logs -f rook-ceph-operator-6b79d99f5c-9564s -n rook-ceph
kubectl get pod -n rook-ceph

NAME                                                        READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-265lj                                      3/3     Running     0          37m
csi-cephfsplugin-2b47v                                      3/3     Running     0          37m
csi-cephfsplugin-2dw6z                                      3/3     Running     0          37m
csi-cephfsplugin-2xgns                                      3/3     Running     0          37m
csi-cephfsplugin-54hd5                                      3/3     Running     0          37m
csi-cephfsplugin-9rgr8                                      3/3     Running     0          37m
csi-cephfsplugin-provisioner-5d999b68d6-8vc2k               4/4     Running     0          37m
csi-cephfsplugin-provisioner-5d999b68d6-xjg4z               4/4     Running     0          37m
csi-cephfsplugin-sp289                                      3/3     Running     0          37m
csi-rbdplugin-7gk9s                                         3/3     Running     0          37m
csi-rbdplugin-9sbwh                                         3/3     Running     0          37m
csi-rbdplugin-jwvxx                                         3/3     Running     0          37m
csi-rbdplugin-n9tcd                                         3/3     Running     0          37m
csi-rbdplugin-provisioner-69b7d7887-4l9l9                   5/5     Running     0          37m
csi-rbdplugin-provisioner-69b7d7887-g48rb                   5/5     Running     0          37m
csi-rbdplugin-q9q5b                                         3/3     Running     0          37m
csi-rbdplugin-snhqj                                         3/3     Running     0          37m
csi-rbdplugin-v6jb8                                         3/3     Running     0          37m
rook-ceph-crashcollector-db.novalocal-fd7dcc457-qd7tx       1/1     Running     0          11m
rook-ceph-crashcollector-deamon.novalocal-7789457f5-v6vnc   1/1     Running     0          11m
rook-ceph-crashcollector-slave.novalocal-5d698c7d7b-hgbnw   1/1     Running     0          9m59s
rook-ceph-crashcollector-static.novalocal-58f769ccc-lsf5b   1/1     Running     0          9m54s
rook-ceph-crashcollector-test.novalocal-6fdc8dbc4f-ksk75    1/1     Running     0          10m
rook-ceph-mgr-a-7898b59757-84tbd                            1/1     Running     0          10m
rook-ceph-mon-a-7676f96769-5mhc6                            1/1     Running     0          12m
rook-ceph-mon-b-79c9c9b59d-g82sc                            1/1     Running     0          11m
rook-ceph-mon-c-7b679d7497-x2hjg                            1/1     Running     0          11m
rook-ceph-operator-6b79d99f5c-9564s                         1/1     Running     0          14m
rook-ceph-osd-0-5b59576bb4-rgsx8                            1/1     Running     0          9m57s
rook-ceph-osd-1-74ff9d79c6-hpkc2                            1/1     Running     0          9m55s
rook-ceph-osd-2-57748ff6bf-vf8jl                            1/1     Running     0          9m59s
rook-ceph-osd-prepare-db.novalocal-jf84v                    0/1     Completed   0          10m
rook-ceph-osd-prepare-slave.novalocal-j457l                 0/1     Completed   0          10m
rook-ceph-osd-prepare-static.novalocal-jt4vj                0/1     Completed   0          10m
rook-discover-hwffd                                         1/1     Running     0          13m
rook-discover-jh8j5                                         1/1     Running     0          13m
rook-discover-mcdjn                                         1/1     Running     0          13m
rook-discover-mmzcb                                         1/1     Running     0          13m
rook-discover-mppgh                                         1/1     Running     0          13m
rook-discover-tz5gm                                         1/1     Running     0          13m
rook-discover-vj7tg                                         1/1     Running     0          13m

Create toolbox and test

Create toolbox pods by toolbox.yaml, and we can attach the pod then test the ceph status.

kubectl apply -f toolbox.yaml
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
ceph status
ceph osd status
ceph df
rados df

[root@rook-ceph-tools-787dc6b944-spsjh /]# ceph status
  cluster:
    id:     2095eaca-93b3-4365-a5c8-9b05269821a9
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,c (age 26m)
    mgr: a(active, since 26m)
    osd: 3 osds: 3 up (since 25m), 3 in (since 25m)

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   6.4 GiB used, 52 GiB / 59 GiB avail
    pgs:

Enable dashboard

Modify the cluster.yaml to enable dashboard and apply to create the dashboard service.

vim cluster.yaml
kubectl apply -f cluster.yaml
kubectl get svc -n rook-ceph |grep mgr-dashboard

rook-ceph-mgr-dashboard    ClusterIP   10.36.19.173     <none>        7000/TCP            66s

Create dashboard ingress

Now the dashboard service is ClusterIP mode which means we can only visit it in the cluster. Create dashboard Traefik yaml file dashboard-ingress.yaml to visit publicly.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ceph-dashboard-ingress
  namespace: rook-ceph
spec:
  rules:
    - host: dashboard.*.*
      http:
        paths:
          - path: /
            backend:
              serviceName: rook-ceph-mgr-dashboard
              servicePort: 7000

kubectl apply -f dashboard-ingress.yaml

Inspect dashboard password

Inspect the dashboard secret. The username is admin.

kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo

Create object gateway

Create object gateway by the object.yaml.

apiVersion: ceph.rook.io/v1
kind: CephObjectStore
metadata:
  name: my-store
  namespace: rook-ceph
spec:
  metadataPool:
    failureDomain: host
    replicated:
      size: 3
  dataPool:
    failureDomain: host
    replicated:
      size: 3
  preservePoolsOnDelete: false
  gateway:
    type: s3
    sslCertificateRef:
    port: 80
    securePort:
    instances: 1
    placement:
    annotations:
    resources:

kubectl apply -f object.yaml
kubectl -n rook-ceph get pod -l app=rook-ceph-rgw

Create radosgw user

Create a radosgw user in the toolbox.

kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
radosgw-admin user create --uid=myuser --display-name=test-user --system
ceph dashboard set-rgw-api-user-id myuser
ceph dashboard set-rgw-api-access-key 32APIT3RA29JCO6OCR8P
ceph dashboard set-rgw-api-secret-key 2ioxTu6iBFkYP8UKiycS90A2DFwRBklSI8Bp3iPQ

{
    "user_id": "myuser",
    "display_name": "test-user",
    "email": "",
    "suspended": 0,
    "max_buckets": 1000,
    "subusers": [],
    "keys": [
        {
            "user": "myuser",
            "access_key": "32APIT3RA29JCO6OCR8P",
            "secret_key": "2ioxTu6iBFkYP8UKiycS90A2DFwRBklSI8Bp3iPQ"
        }
    ],
    "swift_keys": [],
    "caps": [],
    "op_mask": "read, write, delete",
    "system": "true",
    "default_placement": "",
    "default_storage_class": "",
    "placement_tags": [],
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "user_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "temp_url_keys": [],
    "type": "rgw",
    "mfa_ids": []
}

Test S3 service in cluster

Connect the toolbox and test the object storage inside the cluster.

kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
yum --assumeyes install s3cmd
# The content is as bellow. The host is where the rgw service is listening. Run kubectl -n rook-ceph get svc rook-ceph-rgw-my-store, then combine the clusterIP and the port.
vi .s3cfg
s3cmd mb s3://test-bucket
s3cmd ls

[default]
access_key = Y14QX4KYOBCdvwMU6E5R
secret_key = AbeWMQPzpGhZPCMOq9IEkZSxLIgtooQsdvx4Cb4v
host_base = 10.100.191.33
host_bucket = 10.100.191.33/%(bucket)
use_https = False

Create S3 external service

Create the external service for the object store by using NodePort in the rgw-external.yaml. Cannot use Traefik here because it automatically redirect http to https which is not allowed in s3cmd.

apiVersion: v1
kind: Service
metadata:
  name: rook-ceph-rgw-my-store-external
  namespace: rook-ceph
  labels:
    app: rook-ceph-rgw
    rook_cluster: rook-ceph
    rook_object_store: my-store
spec:
  ports:
    - name: rgw
      port: 80
      protocol: TCP
      targetPort: 80
  selector:
    app: rook-ceph-rgw
    rook_cluster: rook-ceph
    rook_object_store: my-store
  sessionAffinity: None
  type: NodePort

kubectl apply -f rgw-external.yaml

Test S3 service outside cluster

Test the object storage outside the cluster. Remember to replace the credentials and endpoint.

# For windows we can download the s3cmd code from github and then "python s3cmd --configure" to save the configuration and then edit it with the below content.
# Run kubectl -n rook-ceph get service rook-ceph-rgw-my-store-external, then combine the node public ip and the external port as the host bellow.
# Run python s3cmd ls to test on Windows

[default]
access_key = Y14QX4KYOBC83Vsev6E5R
secret_key = AbeWMQPzpGhZPCMOq9IEkZSLIsetooQcUfx4Cb4v
host_base = *:*
host_bucket = *:*/%(bucket)
use_https = False

Test S3 service in Java

Test the object storage with java code. Remember to replace the credentials and endpoint. The code works for both Amazon S3 and Ceph S3 except the conn part.

//The conn here is for Ceph S3
AWSCredentials credentials = new BasicAWSCredentials("***", "***");
ClientConfiguration clientConfig = new ClientConfiguration();
clientConfig.setProtocol(Protocol.HTTP);
AmazonS3 conn = AmazonS3Client.builder()
        .withCredentials(new AWSStaticCredentialsProvider(credentials))
        .withClientConfiguration(clientConfig) //Important for Ceph
        .withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration("*:*", null)) //Important for Ceph
        .enablePathStyleAccess() //Important for Ceph
        .build();

//The conn here is for Amazon S3
//AWSCredentials credentials = new BasicAWSCredentials("***", "***");
//AmazonS3 conn = AmazonS3Client.builder()
//        .withRegion("ap-northeast-1") //Important for Amazon
//        .withCredentials(new AWSStaticCredentialsProvider(credentials))
//        .build();

File file = new File("C:\\Users\\fanf\\Pictures\\test.jpg");
FileInputStream bais = new FileInputStream(file);
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(file.length());
metadata.setContentType("image/jpg");
conn.putObject("test-bucket", "test.jpg", bais, metadata);
conn.setObjectAcl("test-bucket", "test.jpg", CannedAccessControlList.PublicRead);

ListObjectsRequest listObjectsRequest =
        new ListObjectsRequest().withBucketName("test-bucket").withDelimiter("test-bucket/");
ObjectListing objects2 = conn.listObjects(listObjectsRequest);
Helper.println(objects2);

conn.setBucketPolicy("test-bucket", "public-read-write");
Bucket bucket2 = conn.createBucket("new-bucket");
ByteArrayInputStream input = new ByteArrayInputStream("Hello World!".getBytes());
conn.putObject(bucket2.getName(), "hello.txt", input, new ObjectMetadata());
conn.setObjectAcl(bucket2.getName(), "hello.txt", CannedAccessControlList.PublicRead);

List<Bucket> buckets = conn.listBuckets();
for (Bucket bucket : buckets) {
    Helper.println(bucket.getName() + "\t" +
            StringUtils.fromDate(bucket.getCreationDate()));
    ObjectListing objects = conn.listObjects(bucket.getName());
    do {
        for (S3ObjectSummary objectSummary : objects.getObjectSummaries()) {
            Helper.println(objectSummary.getKey() + "\t" +
                    objectSummary.getSize() + "\t" +
                    StringUtils.fromDate(objectSummary.getLastModified()));
        }
        objects = conn.listNextBatchOfObjects(objects);
    } while (objects.isTruncated());
}