pgBackRest is a well-known powerful backup and restore tool.
While the documentation describes all the parameters, it’s not always that simple to imagine what you can really do with it.
In this post, I will introduce the asynchronous archiving and the possibility to avoid PostgreSQL to go down in case of archiving problems.
With its “info” command, for performance reasons, pgBackRest doesn’t check that all the needed WAL segments are still present. check_pgbackrest is clearly built for that. The two tricks mentioned above can produce gaps in the archived WAL segments. The new 1.5 release of check_pgbackrest provides ways to handle that, we’ll also see how.
Installation
First of all, install PostgreSQL and pgBackRest packages directly from the PGDG yum repositories:
$ sudo yum install -y https://download.postgresql.org/pub/repos/yum/11/redhat/\
rhel-7-x86_64/pgdg-centos11-11-2.noarch.rpm
$ sudo yum install -y postgresql11-server postgresql11-contrib
$ sudo yum install -y pgbackrest
Check that pgBackRest is correctly installed:
$ pgbackrest
pgBackRest 2.11 - General help
Usage:
pgbackrest [options] [command]
Commands:
archive-get Get a WAL segment from the archive.
archive-push Push a WAL segment to the archive.
backup Backup a database cluster.
check Check the configuration.
expire Expire backups that exceed retention.
help Get help.
info Retrieve information about backups.
restore Restore a database cluster.
stanza-create Create the required stanza data.
stanza-delete Delete a stanza.
stanza-upgrade Upgrade a stanza.
start Allow pgBackRest processes to run.
stop Stop pgBackRest processes from running.
version Get version.
Use 'pgbackrest help [command]' for more information.
Create a basic PostgreSQL cluster :
$ sudo /usr/pgsql-11/bin/postgresql-11-setup initdb
Configure pgBackRest to backup the local cluster
By default, the configuration file is /etc/pgbackrest.conf
.
Let’s make a copy:
$ sudo cp /etc/pgbackrest.conf /etc/pgbackrest.conf.bck
Update the configuration:
[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=1
process-max=2
log-level-console=info
log-level-file=debug
archive-async=y
archive-push-queue-max=100MB
spool-path=/var/spool/pgbackrest
[some_cool_stanza_name]
pg1-path=/var/lib/pgsql/11/data
Make sure that the postgres user can write in /var/lib/pgbackrest
and in
/var/spool/pgbackrest
.
Configure archiving in the postgresql.conf
file:
archive_mode = on
archive_command = 'pgbackrest --stanza=some_cool_stanza_name archive-push %p'
Start the PostgreSQL cluster:
$ sudo systemctl start postgresql-11
Create the stanza and check the configuration:
$ sudo -iu postgres pgbackrest --stanza=some_cool_stanza_name stanza-create
P00 INFO: stanza-create command end: completed successfully
$ sudo -iu postgres pgbackrest --stanza=some_cool_stanza_name check
P00 INFO: WAL segment 000000010000000000000001 successfully stored in the
archive at '/var/lib/pgbackrest/archive/some_cool_stanza_name/
11-1/0000000100000000/
000000010000000000000001-03a91d4d64251a54cf9d48ed59382d3cce3c7652.gz'
P00 INFO: check command end: completed successfully
Let’s finally take our first backup:
$ sudo -iu postgres pgbackrest --stanza=some_cool_stanza_name --type=full backup
...
P00 INFO: new backup label = 20190325-142918F
P00 INFO: backup command end: completed successfully
...
pgBackRest configuration explanations
What the documentation says:
archive-async
Push/get WAL segments asynchronously.
Enables asynchronous operation for the archive-push
and archive-get
commands.
Asynchronous operation is more efficient because it can reuse connections and take advantage of parallelism.
archive-push-queue-max
Maximum size of the PostgreSQL archive queue.
After the limit is reached, the following will happen:
- pgBackRest will notify PostgreSQL that the WAL was successfully archived, then DROP IT.
- A warning will be output to the PostgreSQL log.
If this occurs, then, the archive log stream will be interrupted and PITR will not be possible past that point. A new backup will be required to regain full restore capability.
In asynchronous mode the entire queue will be dropped to prevent spurts of WAL getting through before the queue limit is exceeded again.
The purpose of this feature is to prevent the log volume from filling up at which point PostgreSQL will stop completely.
Better to lose the backup than have PostgreSQL go down.
Don’t use this feature if you want to rely entirely on your backups!
spool-path
This path is used to store data for the asynchronous archive-push
and
archive-get
command.
The asynchronous archive-push
command writes acknowledgments into the spool
path when it has successfully stored WAL in the archive (and errors on
failure) so the foreground process can quickly notify PostgreSQL.
Test the archiving process
Right after the backup, let’s see what’s in the spool directory:
$ ls /var/spool/pgbackrest/archive/some_cool_stanza_name/out/
000000010000000000000003.00000028.backup.ok
000000010000000000000003.ok
Generate a small database change and switch WAL:
$ sudo -iu postgres psql -c "DROP TABLE IF EXISTS my_table; CREATE TABLE my_table(id int); SELECT pg_switch_wal();"
NOTICE: table "my_table" does not exist, skipping
pg_switch_wal
---------------
0/4016850
(1 row)
Check the spool directory again:
$ ls -l /var/spool/pgbackrest/archive/some_cool_stanza_name/out/
000000010000000000000004.ok
What’s in the archives directory?
$ ls /var/lib/pgbackrest/archive/some_cool_stanza_name/11-1/0000000100000000/
000000010000000000000003.00000028.backup
000000010000000000000003-5050f0829090a98c5f92ff112417a2bf6c115ffa.gz
000000010000000000000004-3f9de64182e110ddcfe34d1191ad71c90f4fef3e.gz
Break it!
$ sudo chmod -R 500 /var/lib/pgbackrest/archive/some_cool_stanza_name/11-1/
Generate a small database change and switch WAL:
$ sudo -iu postgres psql -c "DROP TABLE IF EXISTS my_table; CREATE TABLE my_table(id int); SELECT pg_switch_wal();"
pg_switch_wal
---------------
0/501BF88
(1 row)
Check the archiving process:
$ ls /var/lib/pgsql/11/data/pg_wal/archive_status/
000000010000000000000003.00000028.backup.done
000000010000000000000005.ready
$ ls /var/spool/pgbackrest/archive/some_cool_stanza_name/out/
000000010000000000000005.error
By default, a WAL segment is 16MB. We configured archive-push-queue-max
to
100MB, so approximatively 6 archived WAL segments.
What happens after the seventh fail?
Generate a small database change and switch WAL 5 more times with the same command as above.
$ ls /var/spool/pgbackrest/archive/some_cool_stanza_name/out/
000000010000000000000005.error
000000010000000000000006.error
000000010000000000000007.error
000000010000000000000008.error
000000010000000000000009.error
00000001000000000000000A.error
$ ps -ef |grep postgres |grep archiver
00:00:00 postgres: archiver failed on 000000010000000000000005
The archiver process is still blocked on the first fail.
Generate the seventh failure:
$ sudo -iu postgres psql -c "DROP TABLE IF EXISTS my_table; CREATE TABLE my_table(id int); SELECT pg_switch_wal();"
pg_switch_wal
---------------
0/B0025B8
(1 row)
$ ls /var/spool/pgbackrest/archive/some_cool_stanza_name/out/
000000010000000000000005.ok
000000010000000000000006.ok
000000010000000000000007.ok
000000010000000000000008.ok
000000010000000000000009.ok
00000001000000000000000A.ok
00000001000000000000000B.ok
$ ps -ef |grep postgres |grep archiver
00:00:00 postgres: archiver last was 00000001000000000000000B
The archiver isn’t failing anymore BUT there’s no WAL archived either:
$ sudo -iu postgres pgbackrest info --stanza=some_cool_stanza_name
stanza: some_cool_stanza_name
status: ok
cipher: none
db (current)
wal archive min/max (11-1): 000000010000000000000003/000000010000000000000004
full backup: 20190325-142918F
timestamp start/stop: 2019-03-25 14:29:18 / 2019-03-25 14:29:28
wal start/stop: 000000010000000000000003 / 000000010000000000000003
database size: 23.5MB, backup size: 23.5MB
repository size: 2.8MB, repository backup size: 2.8MB
Repair it
$ sudo chmod -R 750 /var/lib/pgbackrest/archive/some_cool_stanza_name/11-1/
Generate a small database change, switch WAL and check the archiving process:
$ sudo -iu postgres psql -c "DROP TABLE IF EXISTS my_table; CREATE TABLE my_table(id int); SELECT pg_switch_wal();"
pg_switch_wal
---------------
0/C0194A8
(1 row)
$ ls /var/spool/pgbackrest/archive/some_cool_stanza_name/out/
00000001000000000000000C.ok
$ sudo -iu postgres pgbackrest info --stanza=some_cool_stanza_name
stanza: some_cool_stanza_name
status: ok
cipher: none
db (current)
wal archive min/max (11-1): 000000010000000000000003/00000001000000000000000C
full backup: 20190325-142918F
timestamp start/stop: 2019-03-25 14:29:18 / 2019-03-25 14:29:28
wal start/stop: 000000010000000000000003 / 000000010000000000000003
database size: 23.5MB, backup size: 23.5MB
repository size: 2.8MB, repository backup size: 2.8MB
$ ls /var/lib/pgbackrest/archive/some_cool_stanza_name/11-1/0000000100000000/
000000010000000000000003.00000028.backup
000000010000000000000003-5050f0829090a98c5f92ff112417a2bf6c115ffa.gz
000000010000000000000004-3f9de64182e110ddcfe34d1191ad71c90f4fef3e.gz
00000001000000000000000C-c90e3f9fbac504f51f44e1446c653d8a124dbd86.gz
Archiving is working but there’s missing archives and pgBackRest doesn’t see it.
The gap is here generated by the archive-push-queue-max
but you could also
have a gap simply due to asynchronous archiving with process-max
greater
than 1.
check_pgbackrest 1.5
The new 1.5 release offers some interesting changes:
- Add
--debug
option to print some debug messages. - Add
ignore-archived-since
argument to ignore the archived WALs since the provided interval. - Add
--latest-archive-age-alert
to define the max age of the latest archived WAL before raising a critical alert.
Download check_pgbackrest:
$ sudo yum install -y perl-JSON
$ sudo -iu postgres
$ wget https://raw.githubusercontent.com/dalibo/check_pgbackrest/REL1_5/check_pgbackrest
$ chmod +x check_pgbackrest
This installation procedure is just a simple example.
Now, check the archives chain to know if there’s something missing:
$ ./check_pgbackrest --stanza=some_cool_stanza_name --service=archives
--repo-path=/var/lib/pgbackrest/archive --format=human
Service : WAL_ARCHIVES
Returns : 2 (CRITICAL)
Message : wrong sequence or missing file @ '000000010000000000000005'
Long message : latest_archive_age=9m54s
Long message : num_archives=3
Long message : archives_dir=/var/lib/pgbackrest/archive/some_cool_stanza_name/11-1
Long message : min_wal=000000010000000000000003
Long message : max_wal=00000001000000000000000C
Long message : oldest_archive=000000010000000000000003-5050f0829090a98c5f92ff112417a2bf6c115ffa.gz
Long message : latest_archive=00000001000000000000000C-c90e3f9fbac504f51f44e1446c653d8a124dbd86.gz
Let’s ignore the latest archive producing the gap:
$ ./check_pgbackrest --stanza=some_cool_stanza_name --service=archives
--repo-path=/var/lib/pgbackrest/archive --format=human
--debug --ignore-archived-since=15m
DEBUG: file 000000010000000000000003-5050f0829090a98c5f92ff112417a2bf6c115ffa.gz as interval since epoch : 36m52s
DEBUG: file 000000010000000000000004-3f9de64182e110ddcfe34d1191ad71c90f4fef3e.gz as interval since epoch : 33m58s
DEBUG: file 00000001000000000000000C-c90e3f9fbac504f51f44e1446c653d8a124dbd86.gz as interval since epoch : 11m45s
DEBUG: max_wal changed to 000000010000000000000004
DEBUG: checking WAL 000000010000000000000003-5050f0829090a98c5f92ff112417a2bf6c115ffa.gz
DEBUG: checking WAL 000000010000000000000004-3f9de64182e110ddcfe34d1191ad71c90f4fef3e.gz
Service : WAL_ARCHIVES
Returns : 0 (OK)
Message : 2 WAL archived, latest archived since 33m58s
Long message : latest_archive_age=33m58s
Long message : num_archives=2
Long message : archives_dir=/var/lib/pgbackrest/archive/some_cool_stanza_name/11-1
Long message : min_wal=000000010000000000000003
Long message : max_wal=000000010000000000000004
Long message : oldest_archive=000000010000000000000003-5050f0829090a98c5f92ff112417a2bf6c115ffa.gz
Long message : latest_archive=000000010000000000000004-3f9de64182e110ddcfe34d1191ad71c90f4fef3e.gz
You also might want to receive an alert if the latest archive is too old:
$ ./check_pgbackrest --stanza=some_cool_stanza_name --service=archives
--repo-path=/var/lib/pgbackrest/archive --format=human
--ignore-archived-since=20m --latest-archive-age-alert=10m
Service : WAL_ARCHIVES
Returns : 2 (CRITICAL)
Message : latest_archive_age (39m16s) exceeded
Long message : latest_archive_age=39m16s
Long message : num_archives=2
Long message : archives_dir=/var/lib/pgbackrest/archive/some_cool_stanza_name/11-1
Long message : min_wal=000000010000000000000003
Long message : max_wal=000000010000000000000004
Long message : oldest_archive=000000010000000000000003-5050f0829090a98c5f92ff112417a2bf6c115ffa.gz
Long message : latest_archive=000000010000000000000004-3f9de64182e110ddcfe34d1191ad71c90f4fef3e.gz
The 2 options are here combined to avoid the alert on the missing archived WAL segments.
Conclusion
pgBackRest offers a lot of possibilities but, mainly for performance reasons, doesn’t check the archives consistency.
Combine it with, for example, a good monitoring system and the check_pgbackrest plugin for more safety.
Tweet