pgstef's blog

SELECT * FROM pgstef

Home About me Talks PITR tools View on GitHub

PostgreSQL needs some infrastructure changes to have a more dynamic reconfiguration around recovery, eg. to change the primary_conninfo at runtime.

The first step, mostly to avoid having to duplicate the GUC logic, results on the following patch.


On 25th of November 2018, Peter Eisentraut committed Integrate recovery.conf into postgresql.conf:

recovery.conf settings are now set in postgresql.conf (or other GUC
sources).  Currently, all the affected settings are PGC_POSTMASTER;
this could be refined in the future case by case.

Recovery is now initiated by a file recovery.signal.  Standby mode is
initiated by a file standby.signal.  The standby_mode setting is
gone.  If a recovery.conf file is found, an error is issued.

The trigger_file setting has been renamed to promote_trigger_file as
part of the move.

The documentation chapter "Recovery Configuration" has been integrated
into "Server Configuration".

pg_basebackup -R now appends settings to postgresql.auto.conf and
creates a standby.signal file.

Author: Fujii Masao <masao.fujii@gmail.com>
Author: Simon Riggs <simon@2ndquadrant.com>
Author: Abhijit Menon-Sen <ams@2ndquadrant.com>
Author: Sergei Kornilov <sk@zsrv.org>
Discussion: https://www.postgresql.org/message-id/flat/607741529606767@web3g.yandex.ru/

Let’s compare a simple example between PostgreSQL 11 and 12.


Initialize replication on v11

With a default postgresql11-server installation on CentOS 7, let’s start archiving on our primary server:

$ mkdir /var/lib/pgsql/11/archives
$ echo "archive_mode = 'on'" >> /var/lib/pgsql/11/data/postgresql.conf
$ echo "archive_command = 'cp %p /var/lib/pgsql/11/archives/%f'" \
>> /var/lib/pgsql/11/data/postgresql.conf
# systemctl start postgresql-11.service

Check that the archiver process is running:

$ psql -c "SELECT pg_switch_wal();"
 pg_switch_wal 
---------------
 0/16AC7D0
(1 row)

$ ps -ef |grep postgres|grep archiver 
... postgres: archiver last was 000000010000000000000001

$ ls -l /var/lib/pgsql/11/archives/
total 16384
-rw-------. 1 postgres postgres 16777216 Nov 26 09:30 000000010000000000000001

Create a base copy for our secondary server:

$ pg_basebackup --pgdata=/var/lib/pgsql/11/replicated_data -P
24502/24502 kB (100%), 1/1 tablespace

Configure the recovery.conf file as following:

$ cat recovery.conf 
standby_mode = 'on'
primary_conninfo = 'port=5432'
restore_command = 'cp /var/lib/pgsql/11/archives/%f %p'
recovery_target_timeline = 'latest'

Change the default port and start:

$ echo 'port = 5433' >> /var/lib/pgsql/11/replicated_data/postgresql.conf
$ /usr/pgsql-11/bin/pg_ctl -D /var/lib/pgsql/11/replicated_data/ start

If the replication setup is correct, you should see those processes:

postgres 10950     1  ... /usr/pgsql-11/bin/postmaster -D /var/lib/pgsql/11/data/
postgres 10958 10950  ... postgres: archiver   last was 000000010000000000000004
postgres 11595 10950  ... postgres: walsender postgres [local] streaming 0/5000140
...
postgres 11586     1  ... /usr/pgsql-11/bin/postgres -D /var/lib/pgsql/11/replicated_data
postgres 11588 11586  ... postgres: startup   recovering 000000010000000000000005
postgres 11594 11586  ... postgres: walreceiver   streaming 0/5000140

We now have a local 2-nodes cluster working with Streaming Replication and archives recovery as safety net.

To stop the cluster:

$ /usr/pgsql-11/bin/pg_ctl -D /var/lib/pgsql/11/replicated_data stop
# systemctl stop postgresql-11.service 

Recovery.conf explanation

These parameters are important:

  • standby_mode

Specifies whether to start the PostgreSQL server as a standby. If this parameter is on, the server will not stop recovery when the end of archived WAL is reached, but will keep trying to continue recovery by fetching new WAL segments using restore_command and/or by connecting to the primary server as specified by the primary_conninfo setting.

  • primary_conninfo

Specifies a connection string to be used for the standby server to connect with the primary.

  • restore_command

The local shell command to execute to retrieve an archived segment of the WAL file series. This parameter is required for archive recovery, but optional for streaming replication.

  • recovery_target_timeline

Specifies recovering into a particular timeline. The default is to recover along the same timeline that was current when the base backup was taken. Setting this to latest recovers to the latest timeline found in the archive, which is useful in a standby server.

For a complete information, you can refer to the standby-settings, archive-recovery-settings and recovery-target-settings official documentation.


Same example with v12

Get PostgreSQL sources and build the v12 version for this specific commit:

# mkdir /opt/build_postgresql
# cd /opt/build_postgresql/
# git clone git://git.postgresql.org/git/postgresql.git
# cd postgresql/
# git checkout 2dedf4d9a899b36d1a8ed29be5efbd1b31a8fe85
# ./configure
# make
# make install
$ /usr/local/pgsql/bin/initdb -D /var/lib/pgsql/12/data_build

Configure the archiver process:

$ mkdir /var/lib/pgsql/12/archives
$ echo "archive_mode = 'on'" >> /var/lib/pgsql/12/data_build/postgresql.conf
$ echo "archive_command = 'cp %p /var/lib/pgsql/12/archives/%f'" \
>> /var/lib/pgsql/12/data_build/postgresql.conf
$ echo "unix_socket_directories = '/var/run/postgresql, /tmp'" \
>> /var/lib/pgsql/12/data_build/postgresql.conf
$ /usr/local/pgsql/bin/pg_ctl -D /var/lib/pgsql/12/data_build start

Check:

$ psql -c "SELECT pg_switch_wal();"
 pg_switch_wal 
---------------
 0/20000B0
(1 row)

$ ps -ef |grep postgres|grep archiver 
... postgres: archiver   last was 000000010000000000000002

Create the base copy:

$ pg_basebackup --pgdata=/var/lib/pgsql/12/replicated_data -P
24534/24534 kB (100%), 1/1 tablespace
$ echo 'port = 5433' >> /var/lib/pgsql/12/replicated_data/postgresql.conf

Here comes the part modified by this patch. Most of the parameters for the recovery and standby mode are now in the main configuration file.

$ echo "primary_conninfo = 'port=5432'" \
>> /var/lib/pgsql/12/replicated_data/postgresql.conf
$ echo "restore_command = 'cp /var/lib/pgsql/12/archives/%f %p'" \
>> /var/lib/pgsql/12/replicated_data/postgresql.conf
$ echo "recovery_target_timeline = 'latest'" \
>> /var/lib/pgsql/12/replicated_data/postgresql.conf

If you simply want to start a recovery process (eg. restore a backup), you need to create a file named recovery.signal in the data directory.

Here, we want to set up a standby server, so we need a file named standby.signal.

$ touch /var/lib/pgsql/12/replicated_data/standby.signal
$ /usr/local/pgsql/bin/pg_ctl -D /var/lib/pgsql/12/replicated_data start

If the replication is correctly setup, you should see those processes:

postgres  9033     1  ... /usr/local/pgsql/bin/postgres -D /var/lib/pgsql/12/data_build  
postgres  9039  9033  ... postgres: archiver   last was 000000010000000000000006
postgres 11608  9033  ... postgres: walsender postgres [local] streaming 0/7000060
...
postgres 11599     1  ... /usr/local/pgsql/bin/postgres -D /var/lib/pgsql/12/replicated_data
postgres 11600 11599  ... postgres: startup   recovering 000000010000000000000007 
postgres 11607 11599  ... postgres: walreceiver   streaming 0/7000060

To stop the cluster:

$ /usr/local/pgsql/bin/pg_ctl -D /var/lib/pgsql/12/replicated_data stop
$ /usr/local/pgsql/bin/pg_ctl -D /var/lib/pgsql/12/data_build stop

pg_basebackup behavior

In version 11, the -R, –write-recovery-conf options write the recovery.conf file. This patch changes this behavior by appending settings to postgresql.auto.conf.

This part is actually raising some concerns in the community. For example, after a complete restore, the recovery.conf file was moved to recovery.done. This will not be the case anymore.


Conclusion

This commit can have a big impact on your backup tools and procedures.

Until the official v12 release, a lot of changes may still happen on this topic.

The best advice is, like always, to have an attentive look to the release notes.