Build Ceph and Kubernetes based distributed file storage system
In this article, we are going to build a Ceph and Kubernetes based distributed file storage system and integrate into our java platform.
Install
Download rook
Download the rook ceph GitHub code.
git clone --single-branch --branch release-1.2 https://github.com/rook/rook.git
Copy yaml
Copy the common.yaml, operator.yaml and toolbox.yaml files from ./rook/cluster/examples/kubernetes/ceph/.
cp ../rook/cluster/examples/kubernetes/ceph/common.yaml ../rook/cluster/examples/kubernetes/ceph/operator.yaml ../rook/cluster/examples/kubernetes/ceph/toolbox.yaml .
Create rook containers
Create the rook ceph containers, and wait the operator to be running status.
kubectl apply -f common.yaml
kubectl apply -f operator.yaml
kubectl get pod -n rook-ceph
Create volumes
Create 3 20G volumes and attach to 3 linux instances. Format the disks. Mkdir /mnt/ceph-storage and mount them all.
sudo fdisk -l
sudo mkfs.ext4 /dev/vdc
sudo mount /dev/vdb /mnt/ceph-storage -t auto
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 30G 19G 11G 65% /
devtmpfs 2.9G 0 2.9G 0% /dev
tmpfs 2.9G 12K 2.9G 1% /dev/shm
tmpfs 2.9G 298M 2.6G 11% /run
tmpfs 2.9G 0 2.9G 0% /sys/fs/cgroup
tmpfs 581M 0 581M 0% /run/user/1000
/dev/vdb 20G 45M 19G 1% /mnt/ceph-storage
Create Ceph cluster
Create cluster by the cluster.yaml. Notice the selected nodes have the disks attached. Each nodes/name field should match their kubernetes.io/hostname
label. Notice the dashboard should be disabled first.
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
dataDirHostPath: /var/lib/rook
mon:
count: 3
cephVersion:
image: ceph/ceph:v14.2.4-20190917
allowUnsupported: false
dashboard:
enabled: false
network:
hostNetwork: false
storage:
useAllNodes: false
useAllDevices: false
config:
metadataDevice:
databaseSizeMB: "1024" # this value can be removed for environments with normal sized disks (100 GB or larger)
journalSizeMB: "1024" # this value can be removed for environments with normal sized disks (20 GB or larger)
nodes:
- name: "slave.novalocal"
directories: # specific directories to use for storage can be specified for each node
- path: "/mnt/ceph-storage"
- name: "static.novalocal"
directories: # specific directories to use for storage can be specified for each node
- path: "/mnt/ceph-storage"
- name: "db.novalocal"
directories: # specific directories to use for storage can be specified for each node
- path: "/mnt/ceph-storage"
kubectl apply -f cluster.yaml
kubectl get pod -n rook-ceph --watch
Re-create operator and check osd
If error happens, remember to delete the folder dataDirHostPath on all nodes and delete the operator, then try again to create the cluster. Attach the operator pod to see the logs. When 3 mon and 3 osd pods are running it means the cluster is successfully created.
sudo rm -rf /var/lib/rook
kubectl delete -f operator.yaml
kubectl logs -f rook-ceph-operator-6b79d99f5c-9564s -n rook-ceph
kubectl get pod -n rook-ceph
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-265lj 3/3 Running 0 37m
csi-cephfsplugin-2b47v 3/3 Running 0 37m
csi-cephfsplugin-2dw6z 3/3 Running 0 37m
csi-cephfsplugin-2xgns 3/3 Running 0 37m
csi-cephfsplugin-54hd5 3/3 Running 0 37m
csi-cephfsplugin-9rgr8 3/3 Running 0 37m
csi-cephfsplugin-provisioner-5d999b68d6-8vc2k 4/4 Running 0 37m
csi-cephfsplugin-provisioner-5d999b68d6-xjg4z 4/4 Running 0 37m
csi-cephfsplugin-sp289 3/3 Running 0 37m
csi-rbdplugin-7gk9s 3/3 Running 0 37m
csi-rbdplugin-9sbwh 3/3 Running 0 37m
csi-rbdplugin-jwvxx 3/3 Running 0 37m
csi-rbdplugin-n9tcd 3/3 Running 0 37m
csi-rbdplugin-provisioner-69b7d7887-4l9l9 5/5 Running 0 37m
csi-rbdplugin-provisioner-69b7d7887-g48rb 5/5 Running 0 37m
csi-rbdplugin-q9q5b 3/3 Running 0 37m
csi-rbdplugin-snhqj 3/3 Running 0 37m
csi-rbdplugin-v6jb8 3/3 Running 0 37m
rook-ceph-crashcollector-db.novalocal-fd7dcc457-qd7tx 1/1 Running 0 11m
rook-ceph-crashcollector-deamon.novalocal-7789457f5-v6vnc 1/1 Running 0 11m
rook-ceph-crashcollector-slave.novalocal-5d698c7d7b-hgbnw 1/1 Running 0 9m59s
rook-ceph-crashcollector-static.novalocal-58f769ccc-lsf5b 1/1 Running 0 9m54s
rook-ceph-crashcollector-test.novalocal-6fdc8dbc4f-ksk75 1/1 Running 0 10m
rook-ceph-mgr-a-7898b59757-84tbd 1/1 Running 0 10m
rook-ceph-mon-a-7676f96769-5mhc6 1/1 Running 0 12m
rook-ceph-mon-b-79c9c9b59d-g82sc 1/1 Running 0 11m
rook-ceph-mon-c-7b679d7497-x2hjg 1/1 Running 0 11m
rook-ceph-operator-6b79d99f5c-9564s 1/1 Running 0 14m
rook-ceph-osd-0-5b59576bb4-rgsx8 1/1 Running 0 9m57s
rook-ceph-osd-1-74ff9d79c6-hpkc2 1/1 Running 0 9m55s
rook-ceph-osd-2-57748ff6bf-vf8jl 1/1 Running 0 9m59s
rook-ceph-osd-prepare-db.novalocal-jf84v 0/1 Completed 0 10m
rook-ceph-osd-prepare-slave.novalocal-j457l 0/1 Completed 0 10m
rook-ceph-osd-prepare-static.novalocal-jt4vj 0/1 Completed 0 10m
rook-discover-hwffd 1/1 Running 0 13m
rook-discover-jh8j5 1/1 Running 0 13m
rook-discover-mcdjn 1/1 Running 0 13m
rook-discover-mmzcb 1/1 Running 0 13m
rook-discover-mppgh 1/1 Running 0 13m
rook-discover-tz5gm 1/1 Running 0 13m
rook-discover-vj7tg 1/1 Running 0 13m
Create toolbox and test
Create toolbox pods by toolbox.yaml, and we can attach the pod then test the ceph status.
kubectl apply -f toolbox.yaml
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
ceph status
ceph osd status
ceph df
rados df
[root@rook-ceph-tools-787dc6b944-spsjh /]# ceph status
cluster:
id: 2095eaca-93b3-4365-a5c8-9b05269821a9
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 26m)
mgr: a(active, since 26m)
osd: 3 osds: 3 up (since 25m), 3 in (since 25m)
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 6.4 GiB used, 52 GiB / 59 GiB avail
pgs:
Enable dashboard
Modify the cluster.yaml to enable dashboard and apply to create the dashboard service.
vim cluster.yaml
kubectl apply -f cluster.yaml
kubectl get svc -n rook-ceph |grep mgr-dashboard
rook-ceph-mgr-dashboard ClusterIP 10.36.19.173 <none> 7000/TCP 66s
Create dashboard ingress
Now the dashboard service is ClusterIP mode which means we can only visit it in the cluster. Create dashboard Traefik yaml file dashboard-ingress.yaml to visit publicly.
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: ceph-dashboard-ingress
namespace: rook-ceph
spec:
rules:
- host: dashboard.*.*
http:
paths:
- path: /
backend:
serviceName: rook-ceph-mgr-dashboard
servicePort: 7000
kubectl apply -f dashboard-ingress.yaml
Inspect dashboard password
Inspect the dashboard secret. The username is admin.
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo
Create object gateway
Create object gateway by the object.yaml.
apiVersion: ceph.rook.io/v1
kind: CephObjectStore
metadata:
name: my-store
namespace: rook-ceph
spec:
metadataPool:
failureDomain: host
replicated:
size: 3
dataPool:
failureDomain: host
replicated:
size: 3
preservePoolsOnDelete: false
gateway:
type: s3
sslCertificateRef:
port: 80
securePort:
instances: 1
placement:
annotations:
resources:
kubectl apply -f object.yaml
kubectl -n rook-ceph get pod -l app=rook-ceph-rgw
Create radosgw user
Create a radosgw user in the toolbox.
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
radosgw-admin user create --uid=myuser --display-name=test-user --system
ceph dashboard set-rgw-api-user-id myuser
ceph dashboard set-rgw-api-access-key 32APIT3RA29JCO6OCR8P
ceph dashboard set-rgw-api-secret-key 2ioxTu6iBFkYP8UKiycS90A2DFwRBklSI8Bp3iPQ
{
"user_id": "myuser",
"display_name": "test-user",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"subusers": [],
"keys": [
{
"user": "myuser",
"access_key": "32APIT3RA29JCO6OCR8P",
"secret_key": "2ioxTu6iBFkYP8UKiycS90A2DFwRBklSI8Bp3iPQ"
}
],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"system": "true",
"default_placement": "",
"default_storage_class": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"temp_url_keys": [],
"type": "rgw",
"mfa_ids": []
}
Test S3 service in cluster
Connect the toolbox and test the object storage inside the cluster.
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
yum --assumeyes install s3cmd
# The content is as bellow. The host is where the rgw service is listening. Run kubectl -n rook-ceph get svc rook-ceph-rgw-my-store, then combine the clusterIP and the port.
vi .s3cfg
s3cmd mb s3://test-bucket
s3cmd ls
[default]
access_key = Y14QX4KYOBCdvwMU6E5R
secret_key = AbeWMQPzpGhZPCMOq9IEkZSxLIgtooQsdvx4Cb4v
host_base = 10.100.191.33
host_bucket = 10.100.191.33/%(bucket)
use_https = False
Create S3 external service
Create the external service for the object store by using NodePort in the rgw-external.yaml. Cannot use Traefik here because it automatically redirect http to https which is not allowed in s3cmd.
apiVersion: v1
kind: Service
metadata:
name: rook-ceph-rgw-my-store-external
namespace: rook-ceph
labels:
app: rook-ceph-rgw
rook_cluster: rook-ceph
rook_object_store: my-store
spec:
ports:
- name: rgw
port: 80
protocol: TCP
targetPort: 80
selector:
app: rook-ceph-rgw
rook_cluster: rook-ceph
rook_object_store: my-store
sessionAffinity: None
type: NodePort
kubectl apply -f rgw-external.yaml
Test S3 service outside cluster
Test the object storage outside the cluster. Remember to replace the credentials and endpoint.
# For windows we can download the s3cmd code from github and then "python s3cmd --configure" to save the configuration and then edit it with the below content.
# Run kubectl -n rook-ceph get service rook-ceph-rgw-my-store-external, then combine the node public ip and the external port as the host bellow.
# Run python s3cmd ls to test on Windows
[default]
access_key = Y14QX4KYOBC83Vsev6E5R
secret_key = AbeWMQPzpGhZPCMOq9IEkZSLIsetooQcUfx4Cb4v
host_base = *:*
host_bucket = *:*/%(bucket)
use_https = False
Test S3 service in Java
Test the object storage with java code. Remember to replace the credentials and endpoint. The code works for both Amazon S3 and Ceph S3 except the conn part.
//The conn here is for Ceph S3
AWSCredentials credentials = new BasicAWSCredentials("***", "***");
ClientConfiguration clientConfig = new ClientConfiguration();
clientConfig.setProtocol(Protocol.HTTP);
AmazonS3 conn = AmazonS3Client.builder()
.withCredentials(new AWSStaticCredentialsProvider(credentials))
.withClientConfiguration(clientConfig) //Important for Ceph
.withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration("*:*", null)) //Important for Ceph
.enablePathStyleAccess() //Important for Ceph
.build();
//The conn here is for Amazon S3
//AWSCredentials credentials = new BasicAWSCredentials("***", "***");
//AmazonS3 conn = AmazonS3Client.builder()
// .withRegion("ap-northeast-1") //Important for Amazon
// .withCredentials(new AWSStaticCredentialsProvider(credentials))
// .build();
File file = new File("C:\\Users\\fanf\\Pictures\\test.jpg");
FileInputStream bais = new FileInputStream(file);
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(file.length());
metadata.setContentType("image/jpg");
conn.putObject("test-bucket", "test.jpg", bais, metadata);
conn.setObjectAcl("test-bucket", "test.jpg", CannedAccessControlList.PublicRead);
ListObjectsRequest listObjectsRequest =
new ListObjectsRequest().withBucketName("test-bucket").withDelimiter("test-bucket/");
ObjectListing objects2 = conn.listObjects(listObjectsRequest);
Helper.println(objects2);
conn.setBucketPolicy("test-bucket", "public-read-write");
Bucket bucket2 = conn.createBucket("new-bucket");
ByteArrayInputStream input = new ByteArrayInputStream("Hello World!".getBytes());
conn.putObject(bucket2.getName(), "hello.txt", input, new ObjectMetadata());
conn.setObjectAcl(bucket2.getName(), "hello.txt", CannedAccessControlList.PublicRead);
List<Bucket> buckets = conn.listBuckets();
for (Bucket bucket : buckets) {
Helper.println(bucket.getName() + "\t" +
StringUtils.fromDate(bucket.getCreationDate()));
ObjectListing objects = conn.listObjects(bucket.getName());
do {
for (S3ObjectSummary objectSummary : objects.getObjectSummaries()) {
Helper.println(objectSummary.getKey() + "\t" +
objectSummary.getSize() + "\t" +
StringUtils.fromDate(objectSummary.getLastModified()));
}
objects = conn.listNextBatchOfObjects(objects);
} while (objects.isTruncated());
}