As part of my cluster backup operations, I run a nightly dump of all my databases. I am using Bitnami mariadb galera and cloudnative-pg posgres. CloudNativePG uses barman for backups to S3 storage which I am using but I also export a fully backup each database to an NFS every night.
The backup consists of three parts:
The backup script itself is a simple shell script that dumps the whole postgres instance, the globals and each individual database to a separate file. A timestamp is part of the exported filename.
#!/bin/sh
now=$(date +"%s_%Y-%m-%d")
echo "Backing up postgres..."
pg_dumpall --file="/destination/${now}_postgresql_all.sql"
pg_dumpall --globals-only --file="/destination/${now}_postgresql_globals.sql"
DBLIST=`psql -d postgres -q -t -c "SELECT datname from pg_database where datname not like 'template%'"`
for d in $DBLIST
do
echo "Backing up db = $d";
pg_dump -Fc $d > /destination/${now}_${d}.sql
done
echo "Done."
The backup script itself sits inside a small postgres docker image created with this Dockerfile
:
FROM postgres:16.1-alpine
WORKDIR /
COPY backup /bin/backup
RUN chown 0:0 /bin/backup
RUN chmod 700 /bin/backup
RUN mkdir /destination/
RUN apk add --no-cache \
tzdata
CMD ["/bin/sh", "/bin/backup"]
I use git tags for image versions and push the images to a local harbor instance. During the docker build, the git tag is extracted and added to the image:
export IMAGE_COMMIT_TAG=$(git describe --tags --abbrev=0 | awk -F '[-.]' '{print $1"."$2"."$3}')
docker buildx build --push --platform linux/arm64 --build-arg "app_name=backup-cloudpg-atuin" -t "harbor.example.com/example.com/backup-cloudpg-atuin:${IMAGE_COMMIT_TAG}" .
Finally, the backup job is scheduled with a Kubernetes CronJob
. The environment variables PGHOST
and PGUSER
provide the hostname, username and password used in the backup script. I have a todo item on the list to use the secrets directly instead of having to provide the details here.
---
# Backup execution command as a docker image
apiVersion: batch/v1
kind: CronJob
metadata:
name: cloudpg-atuin
namespace: backup
labels:
app: backup-cloudpg-atuin
spec:
schedule: "55 23 * * *" # Every day at 23:55
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
containers:
- name: cloudpg-atuin
image: harbor.example.com/example.com/backup-cloudpg-atuin:0.0.6
imagePullPolicy: IfNotPresent
volumeMounts:
- name: cloudpg-atuin
mountPath: /destination
env:
- name: "PGHOST"
value: "cloudpg-atuin-cluster-ro.atuin.svc.cluster.local"
# Need to use the superuser
# cloudpg-atuin-cluster-superuser
- name: "PGUSER"
value: "postgres"
- name: "PGPASSWORD"
value: "<POSTGRES-PASSWORD>"
volumes:
- name: cloudpg-atuin
persistentVolumeClaim:
claimName: cloudpg-atuin
restartPolicy: OnFailure
---
# Persistent volume claim to store the exports
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: cloudpg-atuin
namespace: backup
labels:
app: cloudpg-atuin
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: nfs-client
In the first version of this script (above) the database password is added to the deployment configuration file. Apart from being insecure, this means when the database password changes (eg full restore/upgrade) the new password needs to be added to the file.
Kubernetes does not allow pods to access secrets from a different namespace. Secrets can be copied with grep or sed but… you still have to remember to do it… Then I came across kubernetes-reflector which will syncronise secret (or configmap) values across namespaces.
It’s straight-forward to set up using their helm chart:
k create namespace reflector
helm repo add emberstack https://emberstack.github.io/helm-charts
helm repo update
helm show values emberstack/reflector --version 7.1.288 > reflector-values.yml
helm install reflector emberstack/reflector -n reflector --version 7.1.288 --values reflector-values.yml
Two things to quickly mention here:
The only value I changed in the helm values was to update the cron schedule to run every 16 minutes instead of every 15mins. This just adds some randomness and avoids all jobs (not just reflector) trying to run at the same times:
cron:
enabled: false
schedule: "*/16 * * * *"
I store the full default values configuration file in the git repo. I find it easier to compare version differences and read the documentation/options if everything is in one place.
Ok, so now reflector is running we annotate the existing cloudpg secret and create a placeholder in the backup namespaces:
k annotate secret/cloudpg-atuin-cluster-superuser -n atuin reflector.v1.k8s.emberstack.com/reflection-allowed="true"
k annotate secret/cloudpg-atuin-cluster-superuser -n atuin reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces="backup"
k create secret generic cloudpg-atuin-cluster-superuser -n backup
k annotate secret/cloudpg-atuin-cluster-superuser -n backup reflector.v1.k8s.emberstack.com/reflects="atuin/cloudpg-atuin-cluster-superuser"
The annotations define the synchronisation behaviour:
reflector.v1.k8s.emberstack.com/reflection-allowed: "true"
to the resource annotations to permit reflection to mirrors.reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces: "<list>"
to the resource annotations to permit reflection from only the list of comma separated namespaces or regular expressions. Note: If this annotation is omitted or is empty, all namespaces are allowed.reflector.v1.k8s.emberstack.com/reflects: "<source namespace>/<source name>"
to the mirror object. The value of the annotation is the full name of the source object in namespace/name format.kubernetes-reflector also supports automatically creating the mirror secret with these annotations:
reflector.v1.k8s.emberstack.com/reflection-auto-enabled: "true"
to the resource annotations to automatically create mirrors in other namespaces. Note: Requires reflector.v1.k8s.emberstack.com/reflection-allowed
to be true
since mirrors need to able to reflect the source.reflector.v1.k8s.emberstack.com/reflection-auto-namespaces: "<list>"
to the resource annotations specify in which namespaces to automatically create mirrors. Note: If this annotation is omitted or is empty, all namespaces are allowed. Namespaces in this list will also be checked by reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces
since mirrors need to be in namespaces from where reflection is permitted.Either way, for my set up it’s four lines of code. If you wanted to copy the secrets to multiple places, the automatic mirroring might be preferable.
Last step, instead of:
- name: "PGPASSWORD"
value: "<POSTGRES-PASSWORD>"
pass the secret:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: cloudpg-atuin-cluster-superuser
key: password
optional: false
Note: I’m only using the secret for the password value. The username
and host
values are also in the cloudpg secret. But not the namespace…. which I need to fully define the database source for the backup to be taken from!
The setup for MariaDB Galera follows the same principles as for postgres:
#!/bin/sh
#
now=$(date +"%s_%Y-%m-%d")
/usr/bin/mysqldump --single-transaction --all-databases --opt -h ${MYSQL_HOST} -u ${MYSQL_USER} -p${MYSQL_PASSWORD} > "/destination/${now}_mariadb_galera.sql"
/usr/bin/mysql -h ${MYSQL_HOST} -u ${MYSQL_USER} -p${MYSQL_PASSWORD} -N -e 'show databases' | while read dbname; do /usr/bin/mysqldump -h ${MYSQL_HOST} -u ${MYSQL_USER} -p${MYSQL_PASSWORD} --complete-insert --single-transaction "$dbname" > /destination/"${now}_$dbname".sql; done
FROM alpine:3.18
WORKDIR /
COPY backup /bin/backup
RUN chown 0:0 /bin/backup
RUN chmod 700 /bin/backup
RUN apk add --no-cache \
mysql-client \
tzdata
CMD ["/bin/sh", "/bin/backup"]
The CronJob and PVC set up are as above for postgres. For MariaDB, we set the MYSQL_HOST
, MYSQL_USER
and MYSQL_PASSWORD
environment variables.