8.3. Using resource-level fencing in Pacemaker clusters

This section outlines the steps necessary to prevent Pacemaker from promoting a drbd Master/Slave resource when its DRBD replication link has been interrupted. This keeps Pacemaker from starting a service with outdated data and causing an unwanted "time warp" in the process.

In order to enable any resource-level fencing for DRBD, you must add the following lines to your resource configuration:

resource <resource> {
  disk {
    fencing resource-only;
    ...
  }
}

You will also have to make changes to the handlers section depending on the cluster infrastructure being used:

[Important]Important

It is absolutely vital to configure at least two independent cluster communications channels for this functionality to work correctly. Heartbeat-based Pacemaker clusters should define at least two cluster communication links in their ha.cf configuration files. Corosync clusters should list at least two redundant rings in corosync.conf.

8.3.1. Resource-level fencing with dopd

In Heartbeat-based Pacemaker clusters, DRBD can use a resources-level fencing facility named the DRBD outdate-peer daemon, or dopd for short.

Heartbeat configuration for dopd

To enable dopd, you must add these lines to your /etc/ha.d/ha.cf file:

respawn hacluster /usr/lib/heartbeat/dopd
apiauth dopd gid=haclient uid=hacluster

You may have to adjust dopd's path according to your preferred distribution. On some distributions and architectures, the correct path is /usr/lib64/heartbeat/dopd.

After you have made this change and copied ha.cf to the peer node, put Pacemaker in maintenance mode and run /etc/init.d/heartbeat reload to have Heartbeat re-read its configuration file. Afterwards, you should be able to verify that you now have a running dopd process.

[Note]Note

You can check for this process either by running ps ax | grep dopd or by issuing killall -0 dopd.

DRBD Configuration for dopd

Once dopd is running, add these items to your DRBD resource configuration:

resource <resource> {
    handlers {
        fence-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
        ...
    }
    disk {
        fencing resource-only;
        ...
    }
    ...
}

As with dopd, your distribution may place the drbd-peer-outdater binary in /usr/lib64/heartbeat depending on your system architecture.

Finally, copy your drbd.conf to the peer node and issue drbdadm adjust resource to reconfigure your resource and reflect your changes.

Testing dopd functionality

To test whether your dopd setup is working correctly, interrupt the replication link of a configured and connected resource while Heartbeat services are running normally. You may do so simply by physically unplugging the network link, but that is fairly invasive. Instead, you may insert a temporary iptables rule to drop incoming DRBD traffic to the TCP port used by your resource.

After this, you will be able to observe the resource connection state change from Connected to WFConnection. Allow a few seconds to pass, and you should see the disk statebecome Outdated/DUnknown. That is what dopd is responsible for.

Any attempt to switch the outdated resource to the primary role will fail after this.

When re-instituting network connectivity (either by plugging the physical link or by removing the temporary iptables rule you inserted previously), the connection state will change to Connected, and then promptly to SyncTarget (assuming changes occurred on the primary node during the network interruption). Then you will be able to observe a brief synchronization period, and finally, the previously outdated resource will be marked as UpToDate again.

8.3.2. Resource-level fencing using the Cluster Information Base (CIB)

In order to enable resource-level fencing for Pacemaker, you will have to set two options in drbd.conf:

resource <resource> {
  disk {
    fencing resource-only;
    ...
  }
  handlers {
    fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
    after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
    ...
  }
  ...
}

Thus, if the DRBD replication link becomes disconnected, the crm-fence-peer.sh script contacts the cluster manager, determines the Pacemaker Master/Slave resource associated with this DRBD resource, and ensures that the Master/Slave resource no longer gets promoted on any node other than the currently active one. Conversely, when the connection is re-established and DRBD completes its synchronization process, then that constraint is removed and the cluster manager is free to promote the resource on any node again.