pgBackRest is a well-known powerful backup and restore tool.
Relying on the status information given by the “info” command, we’ve build a specific plugin for Nagios : check_pgbackrest.
This post will help you discover this plugin and assume you already know pgBackRest and Nagios.
Let’s assume we have a PostgreSQL cluster with pgBackRest working correctly.
Given this simple configuration:
[global]
repo1-path=/some_shared_space/
repo1-retention-full=2
[mystanza]
pg1-path=/var/lib/pgsql/11/data
Let’s get the status of our backups with the pgbackrest info
command:
stanza: mystanza
status: ok
cipher: none
db (current)
wal archive min/max (11-1): 00000001000000040000003C/000000010000000B0000004E
full backup: 20190219-121527F
timestamp start/stop: 2019-02-19 12:15:27 / 2019-02-19 12:18:15
wal start/stop: 00000001000000040000003C / 000000010000000400000080
database size: 3.0GB, backup size: 3.0GB
repository size: 168.5MB, repository backup size: 168.5MB
incr backup: 20190219-121527F_20190219-121815I
timestamp start/stop: 2019-02-19 12:18:15 / 2019-02-19 12:20:38
wal start/stop: 000000010000000400000082 / 0000000100000004000000B8
database size: 3.0GB, backup size: 2.9GB
repository size: 175.2MB, repository backup size: 171.6MB
backup reference list: 20190219-121527F
incr backup: 20190219-121527F_20190219-122039I
timestamp start/stop: 2019-02-19 12:20:39 / 2019-02-19 12:22:55
wal start/stop: 0000000100000004000000C1 / 0000000100000004000000F4
database size: 3.0GB, backup size: 3.0GB
repository size: 180.9MB, repository backup size: 177.3MB
backup reference list: 20190219-121527F, 20190219-121527F_20190219-121815I
full backup: 20190219-122255F
timestamp start/stop: 2019-02-19 12:22:55 / 2019-02-19 12:25:47
wal start/stop: 000000010000000500000000 / 00000001000000050000003D
database size: 3.0GB, backup size: 3.0GB
repository size: 186.5MB, repository backup size: 186.5MB
incr backup: 20190219-122255F_20190219-122548I
timestamp start/stop: 2019-02-19 12:25:48 / 2019-02-19 12:28:17
wal start/stop: 000000010000000500000040 / 000000010000000500000077
database size: 3GB, backup size: 3.0GB
repository size: 192.3MB, repository backup size: 188.7MB
backup reference list: 20190219-122255F
incr backup: 20190219-122255F_20190219-122817I
timestamp start/stop: 2019-02-19 12:28:17 / 2019-02-19 12:30:36
wal start/stop: 00000001000000050000007F / 0000000100000005000000B1
database size: 3GB, backup size: 3.0GB
repository size: 197.2MB, repository backup size: 193.5MB
backup reference list: 20190219-122255F
We can now use the check_pgbackrest Nagios plugin. See the INSTALL.md
file
for the complete list of prerequisites.
$ sudo yum install perl-JSON epel-release perl-Net-SFTP-Foreign
To display “human readable” output, we’ll use the --format=human
argument.
Monitor the backup retention
The retention
service will fail when the number of full backups is less than
the --retention-full
argument.
Example:
$ ./check_pgbackrest --service=retention --stanza=mystanza --retention-full=2 --format=human
Service : BACKUPS_RETENTION
Returns : 0 (OK)
Message : backups policy checks ok
Long message : full=2
Long message : diff=0
Long message : incr=4
Long message : latest=incr,20190219-122255F_20190219-122817I
Long message : latest_age=1h18m50s
$ ./check_pgbackrest --service=retention --stanza=mystanza --retention-full=3 --format=human
Service : BACKUPS_RETENTION
Returns : 2 (CRITICAL)
Message : not enough full backups, 3 required
Long message : full=2
Long message : diff=0
Long message : incr=4
Long message : latest=incr,20190219-122255F_20190219-122817I
Long message : latest_age=1h19m25s
It can also fail when the newest backup is older than the --retention-age
argument.
The following units are accepted (not case sensitive):
s (second), m (minute), h (hour), d (day)
.
You can use more than one unit per given value.
$ ./check_pgbackrest --service=retention --stanza=mystanza --retention-age=1h --format=human
Service : BACKUPS_RETENTION
Returns : 2 (CRITICAL)
Message : backups are too old
Long message : full=2
Long message : diff=0
Long message : incr=4
Long message : latest=incr,20190219-122255F_20190219-122817I
Long message : latest_age=1h19m56s
$ ./check_pgbackrest --service=retention --stanza=mystanza --retention-age=2h --format=human
Service : BACKUPS_RETENTION
Returns : 0 (OK)
Message : backups policy checks ok
Long message : full=2
Long message : diff=0
Long message : incr=4
Long message : latest=incr,20190219-122255F_20190219-122817I
Long message : latest_age=1h19m59s
Those 2 options can be used simultaneously:
$ ./check_pgbackrest --service=retention --stanza=mystanza --retention-age=2h --retention-full=2
BACKUPS_RETENTION OK - backups policy checks ok |
full=2 diff=0 incr=4 latest=incr,20190219-122255F_20190219-122817I latest_age=1h20m36s
This service works fine for local or remote backups since it only relies on the info command.
Monitor local WAL segments archives
The archives
service checks if all archived WALs exist between the oldest and
the latest WAL needed for the recovery.
This service requires the --repo-path
argument to specify where the archived
WALs are stored locally.
Archives must be compressed (.gz). If needed, use “compress-level=0” instead of “compress=n”.
Use the --wal-segsize
argument to set the WAL segment size if you don’t use
the default one.
The following units are accepted (not case sensitive):
b (Byte), k (KB), m (MB), g (GB), t (TB), p (PB), e (EB) or Z (ZB)
.
Only integers are accepted. Eg. 1.5MB
will be refused, use 1500kB
.
The factor between units is 1024 bytes. Eg. 1g = 1G = 1024*1024*1024
.
Example:
$ ./check_pgbackrest --service=archives --stanza=mystanza --repo-path="/some_shared_space/archive" --format=human
Service : WAL_ARCHIVES
Returns : 0 (OK)
Message : 1811 WAL archived, latest archived since 41m48s
Long message : latest_wal_age=41m48s
Long message : num_archives=1811
Long message : archives_dir=/some_shared_space/archive/mystanza/11-1
Long message : oldest_archive=00000001000000040000003C-1937e658f8693e3949583d909456ef84398abd03.gz
Long message : latest_archive=000000010000000B0000004E-2b9cc85b487a8e7b297148169018d46e6b7f1ed2.gz
Monitor remote WAL segments archives
The archives
service can also check remote archived WALs using SFTP with the
--repo-host
and --repo-host-user
arguments.
As reminder, you have to setup a trusted SSH communication between the hosts.
We’ll also here assume you have a working setup.
Here’s a simple configuration:
- On the database server
[global]
repo1-host=remote
repo1-host-user=postgres
[mystanza]
pg1-path=/var/lib/pgsql/11/data
- On the backup server
[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=2
[mystanza]
pg1-path=/var/lib/pgsql/11/data
pg1-host=myserver
pg1-host-user=postgres
While the backups are taken from the remote server, the pgbackrest info
command can be executed on both servers:
stanza: mystanza
status: ok
cipher: none
db (current)
wal archive min/max (11-1): 000000010000000B0000006B/000000010000000D00000078
full backup: 20190219-143643F
timestamp start/stop: 2019-02-19 14:36:43 / 2019-02-19 14:40:34
wal start/stop: 000000010000000B0000006B / 000000010000000B000000A9
database size: 3GB, backup size: 3GB
repository size: 242MB, repository backup size: 242MB
incr backup: 20190219-143643F_20190219-144035I
timestamp start/stop: 2019-02-19 14:40:35 / 2019-02-19 14:43:23
wal start/stop: 000000010000000B000000AD / 000000010000000B000000E2
database size: 3GB, backup size: 3.0GB
repository size: 246.3MB, repository backup size: 242.7MB
backup reference list: 20190219-143643F
incr backup: 20190219-143643F_20190219-144325I
timestamp start/stop: 2019-02-19 14:43:25 / 2019-02-19 14:46:32
wal start/stop: 000000010000000B000000EC / 000000010000000C00000022
database size: 3GB, backup size: 3GB
repository size: 250.5MB, repository backup size: 246.9MB
backup reference list: 20190219-143643F, 20190219-143643F_20190219-144035I
full backup: 20190219-144634F
timestamp start/stop: 2019-02-19 14:46:34 / 2019-02-19 14:50:27
wal start/stop: 000000010000000C0000002B / 000000010000000C00000069
database size: 3GB, backup size: 3GB
repository size: 253.7MB, repository backup size: 253.7MB
incr backup: 20190219-144634F_20190219-145028I
timestamp start/stop: 2019-02-19 14:50:28 / 2019-02-19 14:53:10
wal start/stop: 000000010000000C0000006C / 000000010000000C000000A5
database size: 3GB, backup size: 3GB
repository size: 258.1MB, repository backup size: 254.5MB
backup reference list: 20190219-144634F
incr backup: 20190219-144634F_20190219-145311I
timestamp start/stop: 2019-02-19 14:53:11 / 2019-02-19 14:56:26
wal start/stop: 000000010000000C000000AB / 000000010000000C000000E3
database size: 3GB, backup size: 3GB
repository size: 262MB, repository backup size: 258.4MB
backup reference list: 20190219-144634F, 20190219-144634F_20190219-145028I
Example from the database server:
$ ./check_pgbackrest --service=archives --stanza=mystanza --repo-path="/var/lib/pgbackrest/archive" --repo-host=remote --format=human
Service : WAL_ARCHIVES
Returns : 0 (OK)
Message : 526 WAL archived, latest archived since 41s
Long message : latest_wal_age=41s
Long message : num_archives=526
Long message : archives_dir=/var/lib/pgbackrest/archive/mystanza/11-1
Long message : min_wal=000000010000000B0000006B
Long message : max_wal=000000010000000D00000078
Long message : oldest_archive=000000010000000B0000006B-2609fef06d974e5918be051d8a409e7b8b50c818.gz
Long message : latest_archive=000000010000000D00000078-f46f2ccdd176e4de9036d70fc51e1a7dd75aebbf.gz
From the backup server, use the “local” command:
$ ./check_pgbackrest --service=archives --stanza=mystanza --repo-path="/var/lib/pgbackrest/archive"
In case of missing archived WAL segment, you’ll get an error:
$ ./check_pgbackrest --service=archives --stanza=mystanza --repo-path="/var/lib/pgbackrest/archive" --repo-host=remote
WAL_ARCHIVES CRITICAL - wrong sequence or missing file @ '000000010000000D00000037'
Remark
With pgBackRest 2.10, you might not get the min_wal
and max_wal
values:
Long message : min_wal=000000010000000B0000006B
Long message : max_wal=000000010000000D00000078
That behavior comes from the pgbackrest info
command. Indeed, when specifying
--stanza=mystanza
, that information is missing:
wal archive min/max (11-1): none present
Tips
The --command
argument allows to specify which pgBackRest executable file to
use (default: “pgbackrest”).
The --config
parameter allows to provide a specific configuration file to
pgBackRest.
If needed, some prefix command to execute the pgBackRest info command can be
specified with the --prefix
option (eg: “sudo -iu postgres”).
Conclusion
check_pgbackrest is an open project, licensed under the PostgreSQL license.
Any contribution to improve it is welcome.
Tweet