This guide assumes that you are familiar with PostgreSQL administration and streaming replication concepts. For further details on streaming replication, see the PostgreSQL documentation section on streaming replication.
The following terms are used throughout the repmgr documentation.
In the repmgr documentation, "replication cluster" refers to the network of PostgreSQL servers connected by streaming replication.
A node is a single PostgreSQL server within a replication cluster.
The node a standby server connects to, in order to receive streaming replication. This is either the primary server, or in the case of cascading replication, another standby.
This is the action which occurs if a primary server fails and a suitable standby is promoted as the new primary. The repmgrd daemon supports automatic failover to minimise downtime.
In certain circumstances, such as hardware or operating system maintenance, it's necessary to take a primary server offline; in this case a controlled switchover is necessary, whereby a suitable standby is promoted and the existing primary removed from the replication cluster in a controlled manner. The repmgr command line client provides this functionality.
In a failover situation, following the promotion of a new standby, it's essential that the previous primary does not unexpectedly come back on line, which would result in a split-brain situation. To prevent this, the failed primary should be isolated from applications, i.e. "fenced off".
repmgr provides functionality to set up a so-called "witness server" to assist in determining a new primary server in a failover situation with more than one standby. The witness server itself is not part of the replication cluster, although it does contain a copy of the repmgr metadata schema.
The purpose of a witness server is to provide a "casting vote" where servers in the replication cluster are split over more than one location. In the event of a loss of connectivity between locations, the presence or absence of the witness server will decide whether a server at that location is promoted to primary; this is to prevent a "split-brain" situation where an isolated location interprets a network outage as a failure of the (remote) primary and promotes a (local) standby.
A witness server only needs to be created if repmgrd is in use.