8.5. Configuring DRBD to replicate between two SAN-backed Pacemaker clusters

This is a somewhat advanced setup usually employed in split-site configurations. It involves two separate Pacemaker clusters, where each cluster has access to a separate Storage Area Network (SAN). DRBD is then used to replicate data stored on that SAN, across an IP link between sites.

Consider the following illustration to describe the concept.

Figure 8.3. Using DRBD to replicate between SAN-based clusters

drbd-pacemaker-floating-peers

Which of the individual nodes in each site currently acts as the DRBD peer is not explicitly defined — the DRBD peers are said to float; that is, DRBD binds to virtual IP addresses not tied to a specific physical machine.

[Note]Note

This type of setup is usually deployed together with DRBD Proxyand/or truck based replication.

Since this type of setup deals with shared storage, configuring and testing STONITH is absolutely vital for it to work properly.

8.5.1. DRBD resource configuration

To enable your DRBD resource to float, configure it in drbd.conf in the following fashion:

resource <resource> {
  ...
  device /dev/drbd0;
  disk /dev/sda1;
  meta-disk internal;
  floating 10.9.9.100:7788;
  floating 10.9.10.101:7788;
}

The floating keyword replaces the on <host> sections normally found in the resource configuration. In this mode, DRBD identifies peers by IP address and TCP port, rather than by host name. It is important to note that the addresses specified must be virtual cluster IP addresses, rather than physical node IP addresses, for floating to function properly. As shown in the example, in split-site configurations the two floating addresses can be expected to belong to two separate IP networks — it is thus vital for routers and firewalls to properly allow DRBD replication traffic between the nodes.

8.5.2. Pacemaker resource configuration

A DRBD floating peers setup, in terms of Pacemaker configuration, involves the following items (in each of the two Pacemaker clusters involved):

  • A virtual cluster IP address.
  • A master/slave DRBD resource (using the DRBD OCF resource agent).
  • Pacemaker constraints ensuring that resources are started on the correct nodes, and in the correct order.

To configure a resource named mysql in a floating peers configuration in a 2-node cluster, using the replication address 10.9.9.100, configure Pacemaker with the following crm commands:

crm configure
crm(live)configure# primitive p_ip_float_left ocf:heartbeat:IPaddr2 \
                    params ip=10.9.9.100
crm(live)configure# primitive p_drbd_mysql ocf:linbit:drbd \
                    params drbd_resource=mysql
crm(live)configure# ms ms_drbd_mysql drbd_mysql \
                    meta master-max="1" master-node-max="1" \
                         clone-max="1" clone-node-max="1" \
                         notify="true" target-role="Master"
crm(live)configure# order drbd_after_left \
                      inf: p_ip_float_left ms_drbd_mysql
crm(live)configure# colocation drbd_on_left \
                      inf: ms_drbd_mysql p_ip_float_left
crm(live)configure# commit
bye

After adding this configuration to the CIB, Pacemaker will execute the following actions:

  1. Bring up the IP address 10.9.9.100 (on either alice or bob).
  2. Bring up the DRBD resource according to the IP address configured.
  3. Promote the DRBD resource to the Primary role.

Then, in order to create the matching configuration in the other cluster, configure that Pacemaker instance with the following commands:

crm configure
crm(live)configure# primitive p_ip_float_right ocf:heartbeat:IPaddr2 \
                    params ip=10.9.10.101
crm(live)configure# primitive drbd_mysql ocf:linbit:drbd \
                    params drbd_resource=mysql
crm(live)configure# ms ms_drbd_mysql drbd_mysql \
                    meta master-max="1" master-node-max="1" \
                         clone-max="1" clone-node-max="1" \
                         notify="true" target-role="Slave"
crm(live)configure# order drbd_after_right \
                      inf: p_ip_float_right ms_drbd_mysql
crm(live)configure# colocation drbd_on_right
                      inf: ms_drbd_mysql p_ip_float_right
crm(live)configure# commit
bye

After adding this configuration to the CIB, Pacemaker will execute the following actions:

  1. Bring up the IP address 10.9.10.101 (on either charlie or daisy).
  2. Bring up the DRBD resource according to the IP address configured.
  3. Leave the DRBD resource in the Secondary role (due to target-role="Slave").

8.5.3. Site fail-over

In split-site configurations, it may be necessary to transfer services from one site to another. This may be a consequence of a scheduled transition, or of a disastrous event. In case the transition is a normal, anticipated event, the recommended course of action is this:

  • Connect to the cluster on the site about to relinquish resources, and change the affected DRBD resource’s target-role attribute from Master to Slave. This will shut down any resources depending on the Primary role of the DRBD resource, demote it, and continue to run, ready to receive updates from a new Primary.
  • Connect to the cluster on the site about to take over resources, and change the affected DRBD resource’s target-role attribute from Slave to Master. This will promote the DRBD resources, start any other Pacemaker resources depending on the Primary role of the DRBD resource, and replicate updates to the remote site.
  • To fail back, simply reverse the procedure.

In the event that of a catastrophic outage on the active site, it can be expected that the site is off line and no longer replicated to the backup site. In such an event:

  • Connect to the cluster on the still-functioning site resources, and change the affected DRBD resource’s target-role attribute from Slave to Master. This will promote the DRBD resources, and start any other Pacemaker resources depending on the Primary role of the DRBD resource.
  • When the original site is restored or rebuilt, you may connect the DRBD resources again, and subsequently fail back using the reverse procedure.