repmgr cluster matrix runs repmgr cluster show on each node and arranges the results in a matrix, recording success or failure.
repmgr cluster matrix requires a valid repmgr.conf file on each node. Additionally, passwordless ssh connections are required between all nodes.
Example 1 (all nodes up):
$ repmgr -f /etc/repmgr.conf cluster matrix Name | Id | 1 | 2 | 3 -------+----+----+----+---- node1 | 1 | * | * | * node2 | 2 | * | * | * node3 | 3 | * | * | *
Example 2 (node1 and node2 up, node3 down):
$ repmgr -f /etc/repmgr.conf cluster matrix Name | Id | 1 | 2 | 3 -------+----+----+----+---- node1 | 1 | * | * | x node2 | 2 | * | * | x node3 | 3 | ? | ? | ?
Each row corresponds to one server, and indicates the result of testing an outbound connection from that server.
Since node3 is down, all the entries in its row are filled with ?, meaning that there we cannot test outbound connections.
The other two nodes are up; the corresponding rows have x in the column corresponding to node3, meaning that inbound connections to that node have failed, and * in the columns corresponding to node1 and node2, meaning that inbound connections to these nodes have succeeded.
Example 3 (all nodes up, firewall dropping packets originating from node1 and directed to port 5432 on node3) - running repmgr cluster matrix from node1 gives the following output:
$ repmgr -f /etc/repmgr.conf cluster matrix Name | Id | 1 | 2 | 3 -------+----+----+----+---- node1 | 1 | * | * | x node2 | 2 | * | * | * node3 | 3 | ? | ? | ?
Note this may take some time depending on the connect_timeout setting in the node conninfo strings; default is 1 minute which means without modification the above command would take around 2 minutes to run; see comment elsewhere about setting connect_timeout)
The matrix tells us that we cannot connect from node1 to node3, and that (therefore) we don't know the state of any outbound connection from node3.
In this case, the repmgr cluster crosscheck command will produce a more useful result.
One of the following exit codes will be emitted by repmgr cluster matrix:
The check completed successfully and all nodes are reachable.
One or more nodes could not be accessed via SSH.
PostgreSQL on one or more nodes could not be reached.
Note: This error code overrides ERR_BAD_SSH.