pgstef's blog

SELECT * FROM pgstef

Home About me Talks PITR tools View on GitHub

Combining cloud storage and dedicated backup host with pgBackRest

pgBackRest is a popular backup and restore tool for PostgreSQL, known for easily handling even the largest databases and workloads. It’s packed with powerful features, but all that flexibility can sometimes feel a bit overwhelming.

In my earlier posts, I showed how to take backups from a standby server and how to set up a dedicated backup host. But there’s another great feature we haven’t explored yet: pgBackRest can store backups in cloud storage like S3, Azure, or Google Cloud.

Most people choose one option: either a backup host or cloud storage. But did you know you can use both at the same time? This gives you even more flexibility in your backup strategy.

Let’s pick up where we left off. We’ve got a PostgreSQL cluster with a primary server (pg1), a standby server (pg2), and a backup server (repo1) storing backups and WAL archives on an NFS mount. Today, we’ll take it a step further by adding an S3 bucket to the setup :-)

Read More

pgBackRest dedicated backup host

As I mentioned in my last blog post, as your cluster grows with multiple standby servers and potentially automated failover (using tools like Patroni), it becomes more practical to set up a dedicated repository host, also known as a dedicated backup server. This backup server can then trigger backups and automatically select the appropriate node in case of failover, eliminating the need for manual intervention.

In this post, I’ll show you how easy it is to add a repository host to an existing cluster. I’ll also give you a sneak peek at a new feature expected to be included in the next pgBackRest release 😉

Read More

pgBackRest backups from the standby server

Recently, we’ve received many questions about how to take backups from a standby server using pgBackRest. In this post, I’d like to clarify one of the most frequently asked questions and address a common misconception for new users.

First of all, it’s important to understand that taking a backup exclusively from the standby server is not currently possible. When you trigger a backup from the standby, pgBackRest creates a standby backup that is identical to a backup performed on the primary. It does this by starting/stopping the backup on the primary, copying only files that are replicated from the standby, then copying the remaining few files from the primary.

For this setup to work, both the primary and standby servers must share a common backup repository. This can be any supported repository type.

Let’s take an example, using an NFS mount point.

Read More

In-memory disk for PostgreSQL temporary files

Recently, while debugging a performance issue of a CREATE INDEX operation, I was reminded that PostgreSQL might produce temporary files when executing a parallel query, including parallel index creation, because each worker process has its own memory and might need to use disk space for sorting or hash tables.

Thanks to Peter Geoghegan answering this pgsql-admin email thread.

So, in order to try to speed up that index creation, I thought it would be beneficial to move those temporary files directly into memory using a tmpfs and wanted to test that theory, writing this blog post :-)

Read More

pgBackRest differential vs incremental backup

One of the most frequent questions I get is to actually explain the difference between differential and incremental backups in pgBackRest.

As everyone might easily imagine, an incremental backup will only copy the database files (to keep it simple) that have been modified since the last successful backup. If you have a full backup every Sunday, and then an incremental every other day, the backup from Wednesday will be based on the one from Tuesday. That also means that if the backup from Monday or Tuesday is broken/corrupted (for any reason), you might not be able to recover from your Wednesday backup.

Now, what is a differential backup then? It is an incremental backup but it will be based on the last successful full backup. So if you have a full backup every Sunday, and then a differential every other day, the backup from Wednesday will be based on the one from Sunday! It would then doesn’t matter if the other differential backups get broken/corrupted, but since you’ll always compare what has been modified since Sunday, the size of the differential backups will be bigger over time.

Let’s have a concrete example.

Read More

pgBackRest SFTP support

SFTP support has been added in the 2.46 release on 22 May 2022.

In this demo setup, the SFTP host will be called sftp-srv and the PostgreSQL node pg-srv. Both nodes will be running on Rocky Linux 8.

If you’re familiar with Vagrant, here’s a simple Vagrantfile to initiate 3 virtual machines using those names:

# Vagrantfile
Vagrant.configure(2) do |config|
    config.vm.box = 'rockylinux/8'
    config.vm.provider 'libvirt' do |lv|
        lv.cpus = 1
        lv.memory = 1024
    end
    config.vm.synced_folder ".", "/vagrant", disabled: true

    nodes  = 'sftp-srv', 'pg-srv'
    nodes.each do |node|
        config.vm.define node do |conf|
            conf.vm.hostname = node
        end
    end
end

When using Vagrant boxes, it might be needed to enable the SSH password authentication to proceed with SSH key exchange:

$ sudo -i
root# sed -i 's/PasswordAuthentication no/PasswordAuthentication yes/g' /etc/ssh/sshd_config    
root# systemctl restart sshd.service
root# passwd
Read More

pgBackRest 2.41 released

With pgBackRest 2.41 just released, a new feature called backup annotations is now available. Let’s see in this blog post what this is about.

Read More

Patroni and pgBackRest combined

I see more and more questions about pgBackRest in a Patroni cluster on community channels. So, following yesterday’s post about Patroni on pure Raft, we’ll see in this post an example about how to setup pgBackRest in such cases.

To prepare this post, I followed most of the instructions given by Federico Campoli at PGDAY RUSSIA 2021 about Protecting your data with Patroni and pgbackrest. The video recording might even be found here.

Read More

Patroni on pure Raft

Since September 2020 and its 2.0 release, Patroni is able to rely on the pysyncobj module in order to use python Raft implementation as DCS.

In this post, we will setup a demo cluster to illustrate that feature.

Read More

pgBackRest multi-repositories tips and tricks

Since April 2021 and the 2.33 release, pgBackRest allows using multiple repositories at the same time. This brings a lot of benefits like, for example, redundancy and the ability to define various retention policies.

I had the chance to talk about this feature recently at pgDay Paris to highlight the impact of this new feature on the existing pgBackRest commands.

A detailed example can also be found in an EDB docs page I wrote last year when this feature was released.

In this post, we’ll see the most frequent questions I get (in conferences or on community channels) and some tips and tricks.

Read More