Backing up postgres and mariadb with a Kubernetes CronJob



As part of my cluster backup operations, I run a nightly dump of all my databases. I am using Bitnami mariadb galera and cloudnative-pg posgres. CloudNativePG uses barman for backups to S3 storage which I am using but I also export a fully backup each database to an NFS every night.

Postgres Backup

The backup consists of three parts:

  1. backup script to export the database
  2. custom docker image containing the backups script
  3. Kubernetes configuration for the CronJob and a PVC where the data is saved

Backup script

The backup script itself is a simple shell script that dumps the whole postgres instance, the globals and each individual database to a separate file. A timestamp is part of the exported filename.

#!/bin/sh

now=$(date +"%s_%Y-%m-%d")

echo "Backing up postgres..."
pg_dumpall --file="/destination/${now}_postgresql_all.sql"
pg_dumpall --globals-only --file="/destination/${now}_postgresql_globals.sql"

DBLIST=`psql -d postgres -q -t -c "SELECT datname from pg_database where datname not like 'template%'"`
for d in $DBLIST
do
  echo "Backing up db = $d";
  pg_dump -Fc $d > /destination/${now}_${d}.sql
done

echo "Done."

Docker Image

The backup script itself sits inside a small postgres docker image created with this Dockerfile:

FROM postgres:16.1-alpine

WORKDIR /
COPY backup /bin/backup
RUN chown 0:0 /bin/backup 
RUN chmod 700 /bin/backup
RUN mkdir /destination/
RUN apk add --no-cache \
    tzdata
CMD ["/bin/sh", "/bin/backup"]

I use git tags for image versions and push the images to a local harbor instance. During the docker build, the git tag is extracted and added to the image:

export IMAGE_COMMIT_TAG=$(git describe --tags --abbrev=0 | awk -F '[-.]' '{print $1"."$2"."$3}')
docker buildx build --push --platform linux/arm64 --build-arg "app_name=backup-cloudpg-atuin" -t "harbor.example.com/example.com/backup-cloudpg-atuin:${IMAGE_COMMIT_TAG}" .

Kubernetes Configuration

Finally, the backup job is scheduled with a Kubernetes CronJob. The environment variables PGHOST and PGUSER provide the hostname, username and password used in the backup script. I have a todo item on the list to use the secrets directly instead of having to provide the details here.

---
# Backup execution command as a docker image
apiVersion: batch/v1
kind: CronJob
metadata:
  name: cloudpg-atuin
  namespace: backup
  labels:
    app: backup-cloudpg-atuin
spec:
  schedule: "55 23 * * *" # Every day at 23:55
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: cloudpg-atuin
            image: harbor.example.com/example.com/backup-cloudpg-atuin:0.0.6
            imagePullPolicy: IfNotPresent
            volumeMounts:
            - name: cloudpg-atuin
              mountPath: /destination
            env:
            - name: "PGHOST"
              value: "cloudpg-atuin-cluster-ro.atuin.svc.cluster.local"
              # Need to use the superuser
              # cloudpg-atuin-cluster-superuser
            - name: "PGUSER"
              value: "postgres"
            - name: "PGPASSWORD"
              value: "<POSTGRES-PASSWORD>"
          volumes:
          - name: cloudpg-atuin
            persistentVolumeClaim:
              claimName: cloudpg-atuin
          restartPolicy: OnFailure

---
# Persistent volume claim to store the exports
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cloudpg-atuin
  namespace: backup
  labels:
    app: cloudpg-atuin
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: nfs-client

Use secrets for the password

In the first version of this script (above) the database password is added to the deployment configuration file. Apart from being insecure, this means when the database password changes (eg full restore/upgrade) the new password needs to be added to the file.

Kubernetes does not allow pods to access secrets from a different namespace. Secrets can be copied with grep or sed but… you still have to remember to do it… Then I came across kubernetes-reflector which will syncronise secret (or configmap) values across namespaces.

It’s straight-forward to set up using their helm chart:

k create namespace reflector
helm repo add emberstack https://emberstack.github.io/helm-charts
helm repo update
helm show values emberstack/reflector --version 7.1.288 > reflector-values.yml
helm install reflector emberstack/reflector -n reflector --version 7.1.288 --values reflector-values.yml

Two things to quickly mention here:

  1. The only value I changed in the helm values was to update the cron schedule to run every 16 minutes instead of every 15mins. This just adds some randomness and avoids all jobs (not just reflector) trying to run at the same times:

    cron:
      enabled: false
      schedule: "*/16 * * * *"
    
  2. I store the full default values configuration file in the git repo. I find it easier to compare version differences and read the documentation/options if everything is in one place.

Ok, so now reflector is running we annotate the existing cloudpg secret and create a placeholder in the backup namespaces:

k annotate secret/cloudpg-atuin-cluster-superuser  -n atuin reflector.v1.k8s.emberstack.com/reflection-allowed="true"
k annotate secret/cloudpg-atuin-cluster-superuser  -n atuin reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces="backup"
k create secret generic cloudpg-atuin-cluster-superuser -n backup
k annotate secret/cloudpg-atuin-cluster-superuser -n backup reflector.v1.k8s.emberstack.com/reflects="atuin/cloudpg-atuin-cluster-superuser"

The annotations define the synchronisation behaviour:

kubernetes-reflector also supports automatically creating the mirror secret with these annotations:

Either way, for my set up it’s four lines of code. If you wanted to copy the secrets to multiple places, the automatic mirroring might be preferable.

Updating the backup deployment file

Last step, instead of:

            - name: "PGPASSWORD"
              value: "<POSTGRES-PASSWORD>"

pass the secret:

            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: cloudpg-atuin-cluster-superuser
                  key: password
                  optional: false

Note: I’m only using the secret for the password value. The username and host values are also in the cloudpg secret. But not the namespace…. which I need to fully define the database source for the backup to be taken from!

MariaDG Galera

The setup for MariaDB Galera follows the same principles as for postgres:

Backup Script

#!/bin/sh
#
now=$(date +"%s_%Y-%m-%d")
/usr/bin/mysqldump --single-transaction --all-databases --opt -h ${MYSQL_HOST} -u ${MYSQL_USER} -p${MYSQL_PASSWORD} > "/destination/${now}_mariadb_galera.sql"

/usr/bin/mysql -h ${MYSQL_HOST} -u ${MYSQL_USER} -p${MYSQL_PASSWORD} -N -e 'show databases' | while read dbname; do /usr/bin/mysqldump -h ${MYSQL_HOST} -u ${MYSQL_USER} -p${MYSQL_PASSWORD} --complete-insert --single-transaction "$dbname" > /destination/"${now}_$dbname".sql; done

Docker image

FROM alpine:3.18

WORKDIR /
COPY backup /bin/backup
RUN chown 0:0 /bin/backup 
RUN chmod 700 /bin/backup

RUN apk add --no-cache \
    mysql-client \
    tzdata
CMD ["/bin/sh", "/bin/backup"]

Kubernetes Configuration

The CronJob and PVC set up are as above for postgres. For MariaDB, we set the MYSQL_HOST, MYSQL_USER and MYSQL_PASSWORD environment variables.