beyondwatts

Migrating cloudnative-pg storage to ceph

Oct 2024

Now that I’ve configured a small ceph cluster, it’s time to start moving services across. First up are the cloudnative-pg postgres databases which until now have been storing data on a Synology NAS with the synology-csi plugin.

There are a couple of ways to tackle the migration:

Copy the data from existing PVCs to new ones
Use the cloudnative-pg backup to restore to new PVCs
Use a SQL dump to export and import the data

I take a nightly export of each database as part of the backup routine - in additiona to using cloudnative-pg’s backup to S3 storage. In this case, I want to move both the PVC and the S3 storage so using the SQL export makes the most sense.

The examples here is for the atuin shell history tool but I’ll apply the same steps for the other cloudnative-pg databases.

Configure S3 for cloudnative-pg backups

Create a new S3 user

apiVersion: ceph.rook.io/v1
kind: CephObjectStoreUser
metadata:
  name: cloudnative-pg
  namespace: rook-ceph
spec:
  store: ceph-objectstore
  clusterNamespace: rook-ceph
  displayName: "A user for cloudnative-pg database backups"

Apply it:

k apply -f s3-user-cloudnative-pg.yml -n rook-ceph

Create a new S3 bucket

I’m specifying the bucket name with the bucketName parameter as this is a small cluster and the name will be unique. The ceph documentation recommends instead using generateBucketName to add a unique ID after the bucket name prefix.

apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
  name: ceph-bucket-cloudnative-pg
  namespace: rook-ceph
spec:
  bucketName: cloudnative-pg
  # use generateBucketName to generate a bucket name with a prefix and unique ID
  # generateBucketName:
  storageClassName: ceph-bucket

Apply it:

k apply -f s3-bucket-cloudnative-pg.yml -n rook-ceph

Configure access to the bucket for the user

Using the rook-ceph operator, it is not possible to configure the ACL directly using the ObjectBucketClaim.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "s3:GetObject",
        "s3:DeleteObject",
        "s3:PutObject",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::cloudnative-pg/*",
        "arn:aws:s3:::cloudnative-pg"
      ],
      "Principal": {
        "AWS": [
          "arn:aws:iam:::user/cloudnative-pg"
        ]
      }
    }
  ]
}

I’m applying the policy with the s3cmd command tool. First install it:

sudo apt-get install s3cmd

Create then a file named ~/.s3cfg to configure s3cmd:

# Setup endpoint: hostname of the Web App
host_base = s3.example.com
host_bucket = s3.example.com
# Leave as default
bucket_location = 
use_https = True

# Setup access keys
# Access Key = Ceph S3 Account name
access_key =  user_access_key
# Secret Key = Ceph S3 Account Key
secret_key = user_access_secret

# Use S3 v4 signature APIs
signature_v2 = False

The access_key will need to be for a user that has permissions for the bucket, for example dashboard-admin. Test the s3cmd is working:

s3cmd ls s3://cloudnative-pg
s3cmd put test.txt s3://cloudnative-pg
s3cmd del s3://cloudnative-pg/cluster.yml

Apply the ACL:

s3cmd setpolicy s3-bucket-acl-cloudnative-pg.json s3://cloudnative-pg

Migrate the database

Scale down services using the database

k scale --replicas 0 deployment/atuin -n atuin

Update the helm chart

  storage:
    size: 2Gi
    storageClass: "ceph-block"
  endpointURL: "https://s3.example.com"  # Leave empty if using the default S3 endpoint
  destinationPath: ""
  # -- One of `s3`, `azure` or `google`
  provider: s3
  s3:
    region: ""
    bucket: "cloudnative-pg"
    path: "/"
    accessKey: "access_key"
    secretKey: "secret_key"

Get the accessKey and secretKey for the user we created above using:

k get -n rook-ceph secret/rook-ceph-object-user-ceph-objectstore-cloudnative-pg --template={{.data.AccessKey}} | base64 -d
k get -n rook-ceph secret/rook-ceph-object-user-ceph-objectstore-cloudnative-pg --template={{.data.SecretKey}} | base64 -d

Check the postgres dump backup sql export exists

Before we delete the existing install, it would be good to double check that there is a postgres dump that we can restore from :grinning:

Delete and resinstall the helm chart

helm delete atuin -n atuin
helm install cloudpg-atuin -n atuin cnpg/cluster --version 0.0.9 --values cloudnative-cluster-atuin-values.yml

Import the postgres sql dump

For most of the homelab config I use my laptop and a wireless connection. For database dumps and imports, I find it more reliable to use a wired connection. Log into one of the hosts and grab the dump file:

ssh pi1.example.com
scp gary@nas1.example.com:/volume1/storage/microk8s/backup-cloudpg-atuin-pvc-519d55fe-515c-453d-a5cd-4f16367bbd4a/1728690902_2024-10-11_atuin.sql .

Import it:

k run postgresql-client --rm -i --restart='Never' \
--namespace atuin --image bitnami/postgresql \
--command -- /bin/bash -c "PGPASSWORD=password pg_restore --host cloudpg-atuin-cluster-rw -U postgres -d atuin" < 1728690902_2024-10-11_atuin.sql

Validate that the import was successful. First create a pod to access the database:

k run postgresql-client --rm -it --restart='Never' \
--namespace atuin --image bitnami/postgresql \
--command -- /bin/bash -c "PGPASSWORD=password psql --host cloudpg-atuin-cluster-rw -U postgres -d postgres"

Check the database exists:

# \l
                                                List of databases
   Name    |  Owner   | Encoding | Locale Provider | Collate | Ctype | Locale | ICU Rules |   Access privileges
-----------+----------+----------+-----------------+---------+-------+--------+-----------+-----------------------
 atuin     | atuin    | UTF8     | libc            | C       | C     |        |           |
 postgres  | postgres | UTF8     | libc            | C       | C     |        |           |

Check there are tables and data:

# \c atuin
# \dt
                 List of relations
 Schema |           Name           | Type  | Owner
--------+--------------------------+-------+-------
 public | _sqlx_migrations         | table | atuin
 public | history                  | table | atuin
 public | records                  | table | atuin
 public | sessions                 | table | atuin
 public | store                    | table | atuin
 public | total_history_count_user | table | atuin
 public | users                    | table | atuin
 
 # select count(*) from history;
 count
-------
  8876
(1 row)

# exit

Scale back up the services using the database

Update the application and backup database passwords

k scale --replicas 1 deployment/atuin -n atuin
k rollout restart deploy/atuin -n atuin

And confirm everything is working:

atuin sync