repmgr node rejoin — rejoin a dormant (stopped) node to the replication cluster
Enables a dormant (stopped) node to be rejoined to the replication cluster.
This can optionally use pg_rewind to re-integrate a node which has diverged from the rest of the cluster, typically a failed primary.
      Note that repmgr node rejoin can only be used to attach
      a standby to the current primary, not another standby.
    
If the node is running and needs to be attached to the current primary, use repmgr standby follow.
Note repmgr standby follow can only be used for standbys which have not diverged from the rest of the cluster.
      repmgr node rejoin -d '$conninfo'
      where $conninfo is the PostgreSQL conninfo string of the
      current primary node (or that of any reachable node in the cluster, but
      not the local node). This is so that repmgr can fetch up-to-date information
      about the current state of the cluster.
    
      repmgr.conf for the stopped node *must* be supplied explicitly if not
      otherwise available.
    
--dry-runCheck prerequisites but don't actually execute the rejoin.
--force-rewindExecute pg_rewind.
See Using pg_rewind for more details on using pg_rewind.
--config-filescomma-separated list of configuration files to retain after executing pg_rewind.
Currently pg_rewind will overwrite the local node's configuration files with the files from the source node, so it's advisable to use this option to ensure they are kept.
--config-archive-dir
            Directory to temporarily store configuration files specified with
            --config-files; default: /tmp.
          
-W/--no-waitDon't wait for the node to rejoin cluster.
If this option is supplied, repmgr will restart the node but not wait for it to connect to the primary.
           node_rejoin_timeout:
		   the maximum length of time (in seconds) to wait for
		   the node to reconnect to the replication cluster (defaults to
		   the value set in standby_reconnect_timeout,
		   60 seconds).
		 
           Note that standby_reconnect_timeout must be
           set to a value equal to or greater than
           node_rejoin_timeout.
         
      A node_rejoin event notification will be generated.
    
      One of the following exit codes will be emitted by repmgr node rejoin:
    
SUCCESS (0)
            The node rejoin succeeded; or if --dry-run was provided,
            no issues were detected which would prevent the node rejoin.
          
ERR_BAD_CONFIG (1)A configuration issue was detected which prevented repmgr from continuing with the node rejoin.
ERR_NO_RESTART (4)The node could not be restarted.
ERR_REJOIN_FAIL (24)The node rejoin operation failed.
      Currently repmgr node rejoin can only be used to attach
      a standby to the current primary, not another standby.
    
The node's PostgreSQL instance must have been shut down cleanly. If this was not the case, it will need to be started up until it has reached a consistent recovery point, then shut down cleanly.
      In PostgreSQL 13 and later, this will be done automatically
      if the --force-rewind is provided (even if an actual rewind
      is not necessary).
    
With PostgreSQL 12 and earlier, PostgreSQL will need to be started and shut down manually; see below for the best way to do this.
        If PostgreSQL is started in single-user mode and
        input is directed from /dev/null/, it will perform recovery
        then immediately quit, and will then be in a state suitable for use by
        pg_rewind.
        
          rm -f /var/lib/pgsql/data/recovery.conf
          postgres --single -D /var/lib/pgsql/data/ < /dev/null
        Note that  standby.signal (PostgreSQL 11 and earlier:
        recovery.conf) must be removed
        from the data directory for PostgreSQL to be able to start in single
        user mode.
      
pg_rewind
      repmgr node rejoin can optionally use pg_rewind to re-integrate a
      node which has diverged from the rest of the cluster, typically a failed primary.
    
        pg_rewind requires that either
        wal_log_hints is enabled, or that
        data checksums were enabled when the cluster was initialized. See the
        pg_rewind documentation for details.
      
        Additionally, full_page_writes must be enabled; this is the default and
        normally should never be disabled.
      
      We strongly recommend familiarizing yourself with pg_rewind before attempting
      to use it with repmgr, as while it is an extremely useful tool, it is not
      a "magic bullet" which can resolve all problematic replication situations.
    
      A typical use-case for pg_rewind is when a scenario like the following
      is encountered:
      
    $ repmgr node rejoin -f /etc/repmgr.conf -d 'host=node3 dbname=repmgr user=repmgr' \
        --force-rewind --config-files=postgresql.local.conf,postgresql.conf --verbose --dry-run
    NOTICE: rejoin target is node "node3" (node ID: 3)
    INFO: replication connection to the rejoin target node was successful
    INFO: local and rejoin target system identifiers match
    DETAIL: system identifier is 6652184002263212600
    ERROR: this node cannot attach to rejoin target node 3
    DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/610D710
    HINT: use --force-rewind to execute pg_rewind
      Here, node3 was promoted to a primary while the local node was
      still attached to the previous primary; this can potentially happen during e.g. a
      network split. pg_rewind can re-sync the local node with node3,
      removing the need for a full reclone.
    
      To have repmgr node rejoin use pg_rewind,
      pass the command line option --force-rewind, which will tell repmgr
      to execute pg_rewind to ensure the node can be rejoined successfully.
    
pg_rewind and configuration file retention
        Be aware that if pg_rewind is executed and actually performs a
        rewind operation, any configuration files in the PostgreSQL data directory will be
        overwritten with those from the source server.
      
        To prevent this happening, provide a comma-separated list of files to retain
        using the --config-file command line option; the specified files
        will be archived in a temporary directory (whose parent directory can be specified with
        --config-archive-dir, default: /tmp)
        and restored once the rewind operation is complete.
      
repmgr node rejoin and pg_rewind
        Example, first using --dry-run, then actually executing the
        node rejoin command.
        
    $ repmgr node rejoin -f /etc/repmgr.conf -d 'host=node3 dbname=repmgr user=repmgr' \
        --config-files=postgresql.local.conf,postgresql.conf --verbose --force-rewind --dry-run
    NOTICE: rejoin target is node "node3" (node ID: 3)
    INFO: replication connection to the rejoin target node was successful
    INFO: local and rejoin target system identifiers match
    DETAIL: system identifier is 6652460429293670710
    NOTICE: pg_rewind execution required for this node to attach to rejoin target node 3
    DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/610D710
    INFO: prerequisites for using pg_rewind are met
    INFO: file "postgresql.local.conf" would be copied to "/tmp/repmgr-config-archive-node2/postgresql.local.conf"
    INFO: file "postgresql.replication-setup.conf" would be copied to "/tmp/repmgr-config-archive-node2/postgresql.replication-setup.conf"
    INFO: pg_rewind would now be executed
    DETAIL: pg_rewind command is:
      pg_rewind -D '/var/lib/postgresql/data' --source-server='host=node3 dbname=repmgr user=repmgr'
    INFO: prerequisites for executing NODE REJOIN are met
            If --force-rewind is used with the --dry-run option,
            this checks the prerequisites for using pg_rewind, but is
            not an absolute guarantee that actually executing pg_rewind
            will succeed. See also section Caveats below.
          
    $ repmgr node rejoin -f /etc/repmgr.conf -d 'host=node3 dbname=repmgr user=repmgr' \
        --config-files=postgresql.local.conf,postgresql.conf --verbose --force-rewind
    NOTICE: pg_rewind execution required for this node to attach to rejoin target node 3
    DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/610D710
    NOTICE: executing pg_rewind
    DETAIL: pg_rewind command is "pg_rewind -D '/var/lib/postgresql/data' --source-server='host=node3 dbname=repmgr user=repmgr'"
    NOTICE: 2 files copied to /var/lib/postgresql/data
    NOTICE: setting node 2's upstream to node 3
    NOTICE: starting server using "pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/pgsql/data' start"
    NOTICE: NODE REJOIN successful
    DETAIL: node 2 is now attached to node 3
pg_rewind and PostgreSQL 9.4
        pg_rewind is available in PostgreSQL 9.5 and later as part of the core distribution.
        Users of PostgreSQL 9.4 will need to manually install it; the source code is available here:
        https://github.com/vmware/pg_rewind.
        If the pg_rewind
        binary is not installed in the PostgreSQL bin directory, provide
        its full path  on the demotion candidate  with --force-rewind.
      
Note that building the 9.4 version of pg_rewind requires the PostgreSQL source code.
repmgr node rejoin
     repmgr node rejoin attempts to determine whether it will succeed by
     comparing the timelines and relative WAL positions of the local node (rejoin candidate) and primary
     (rejoin target). This is particularly important if planning to use pg_rewind,
     which currently (as of PostgreSQL 12) may appear to succeed (or indicate there is no action
     needed) but potentially allow an impossible action, such as trying to rejoin a standby to a
     primary which is behind the standby. repmgr will prevent this situation from occurring.
   
     Currently it is not possible to detect a situation where the rejoin target
     is a standby which has been "promoted" by removing recovery.conf
     (PostgreSQL 12 and later: standby.signal) and restarting it.
     In this case there will be no information about the point the rejoin target diverged
     from the current standby; the rejoin operation will fail and
     the current standby's PostgreSQL log will contain entries with the text
     "record with incorrect prev-link".
   
In PostgreSQL 9.5 and earlier, it is not possible to use pg_rewind to attach to a target node with a lower timeline than the local node.
     We strongly recommend running repmgr node rejoin with the
     --dry-run option first. Additionally it might be a good idea
     to execute the pg_rewind command displayed by
     repmgr with the pg_rewind --dry-run
     option. Note that pg_rewind does not indicate that it
     is running in --dry-run mode.
   
In all PostgreSQL released before February 2021, pg_rewind contains a corner-case bug which affects standbys in a very specific situation.
This situation occurs when a standby was shut down before its primary node, and an attempt is made to attach this standby to another primary in the same cluster (following a "split brain" situation where the standby was connected to the wrong primary). In this case, repmgr will correctly determine that pg_rewind should be executed, however pg_rewind incorrectly decides that no action is necessary.
In this situation, repmgr will report something like:
    NOTICE: pg_rewind execution required for this node to attach to rejoin target node 1
    DETAIL: rejoin target server's timeline 3 forked off current database system timeline 2 before current recovery point 0/7019C10but when executed, pg_rewind will report:
    pg_rewind: servers diverged at WAL location 0/7015540 on timeline 2
    pg_rewind: no rewind requiredand if an attempt is made to attach the standby to the new primary, PostgreSQL logs on the standby will contain errors like:
    [2020-09-07 15:01:41 UTC]    LOG:  00000: replication terminated by primary server
    [2020-09-07 15:01:41 UTC]    DETAIL:  End of WAL reached on timeline 2 at 0/7015540.
    [2020-09-07 15:01:41 UTC]    LOG:  00000: new timeline 3 forked off current database system timeline 2 before current recovery point 0/7019C10
Currently it is not possible to resolve this situation using pg_rewind. A patch was submitted and is included in all PostgreSQL versions released in February 2021 or later.
       As a workaround, start the primary server the standby was previously attached to,
       and ensure the standby can be attached to it. If pg_rewind was actually executed,
       it will have copied in the .history file from the target primary server; this must
       be removed. repmgr node rejoin can then be used to attach the standby to the original
       primary. Ensure any changes pending on the primary have propagated to the standby. Then shut down the primary
       server first, before shutting down the standby. It should then be possible to
       use repmgr node rejoin to attach the standby to the new primary.