Thursday 6 June 2013

Using SCSI Persistent Reservations with RHEL Advanced Platform Cluster

Using SCSI Persistent Reservations with RHEL Advanced Platform Cluster

  Introduction

When cluster nodes share storage devices, it is necessary to control access to the storage devices. In the event of a node failure, the failed not should not have access to the underlying storage devices. SCSI persistent reservations provide the capability to control the access of each node to shared storage devices. Red Hat Cluster Suite employs SCSI persistent reservations as a fencing methods through the use of the fence_scsi agent. The fence_scsi agent provides a method to revoke access to shared storage devices, provided that the storage support SCSI persistent reservations.
Using SCSI reservations as a fencing method is quite different from traditional power fencing methods. It is important to understand the software, hardware, and configuration requirements prior to using SCSI persistent reservations as a fencing method.
This document describes the use of SCSI persistent reservations for Red Hat Cluster Suite version 6.0 and greater. Information about using SCSI persistent reservations with older versions of Red Hat Cluster Suite can be found here:

   Overview

In order to understand how Red Hat Cluster Suite is able to use SCSI persistent reservations as a fencing method, it is helpful to have some basic knowledge of SCSI persistent reservations.
There are two important concepts within SCSI persistent reservations that should be made clear: registrations and reservations.

Registrations

A registration occurs when a node registers a unique key with a device. A device can have many registrations. For our purposes, each node will create a registration on each device.

Reservations

A reservation dictates how a device can be accessed. In contrast to registrations, there can be only one reservation on a device at any time. The node that holds the reservation is know as the "reservation holder". The reservation defines how other nodes may access the device. For example, Red Hat Cluster Suite uses a "Write Exclusive, Registrants Only" reservation. This type of reservation indicates that only nodes that have registered with that device may write to the device.

 Fencing

Red Hat Cluster Suite is able to perform fencing via SCSI persistent reservations by simply removing a node's registration key from all devices. When a node failure occurs, the fence_scsi agent will remove the failed node's key from all devices, thus preventing it from being able to write to those devices.

   Requirements

    Software Requirements

In order use SCSI persistent reservations as a fencing method, the following software requirements must be met:
  • Red Hat Cluster Suite 6.0 or greater
In addition, the sg3_utils package must be installed. This package provides the tools required to manage SCSI persistent reservations.
Information about using SCSI persistent reservations with older versions of Red Hat Cluster Suite can be found here:

 Storage Requirements

In order to use SCSI persistent reservations as a fencing method, all shared storage must be SPC-3 compliant. In addition, all devices must support the "preempt and about" service action. SCSI-2 devices are not supported.

  Limitations

  • Multipath devices can be used with fence_scsi, but they must be used with device-mapper-multipath. No other types of mutltipath devices are currently supported.
  • All nodes in the cluster must have a consistent view of storage. In other words, all nodes in the cluster must register with the same devices. This limitation exists for the simple reason that each node must be able to remove another node's registration key from all the devices that it registered with. In order to do this, the node performing the fencing operation must be aware of all devices that other nodes are registered with.
  • If fence_scsi is used in conjunction with qdisk, the qdisk device must not be controlled by fence_scsi. If fence_scsi is configured to do automatic device detection, then the qdisk device must not be under clvmd control. If fence_scsi uses manually defined devices, the qdisk device must not be listed. See section 5.4 of this document for information about automatic and manual device configuration.

   Configuration

 Unfencing

In addition to the standard fence section, fence_scsi requires an additional unfence section in each clusternode entry. This unfence section is used at cluster startup to created registrations on all devices. This unfence section must contains a device entry which should be identical to the device entry in the fence section except that the unfence section must have "action=on" in its device entry.
Below is an example of the fence and unfence sections:
<clusternode name="node-01" votes="1" nodeid="1">
 <fence>
  <method name="scsi">
  <device name="scsi_dev" key="1"/>
  </method>
 </fence>
 <unfence>
  <device name="scsi_dev" key="1" action="on"/>
 </unfence>
</clusternode>
In this example the node's key value is manually defined. If keys are manually defined, they must be defined in both the fence and unfence elements and the key values must be equivalent. For more information about manually defined key values, see section 5.2.

Keys

SCSI persistent reservations use unique key values for registrations and reservations. These key values can be manually defined in the cluster.conf file or they can be generated automatically.
To manually define key values, use the "key" parameter within the device sections for each clusternode entry. This key value must be given in both the fence and unfence sections of the configuration file and the value must be the same within each clusternode.
Below is an example where keys are defined manually for a node.
<clusternode name="node-01" votes="1" nodeid="1">
 <fence>
  <method name="scsi">
  <device name="scsi_dev" key="1"/>
  </method>
 </fence>
 <unfence>
  <device name="scsi_dev" key="1" action="on"/>
 </unfence>
</clusternode>
In this example, the node will use a key value of "1". Note that the key value is given in the fence and unfence sections. All other node configurations should use the same format, except that they must use different key values. The key value can be any hexadecimal value, up to 64 bits.
To have the cluster automatically define key values, simply do not use the "key" parameter. When the "key" parameter is not defined, key values will be generated by combining the cluster_id and the nodeid. This value is guaranteed to be unique and consistent for all nodes in the cluster.

Devices

The devices to be used with fence_scsi can be manually configured or discovered automatically.
To manually define the devices to be used with fence_scsi, use the "devices" parameter within the fencedevice section. The devices parameter should be a comma-separated list of block devices. The devices listed here will receive SCSI persistent reservations commands (registrations and reservations), so each device listed must be a SPC-3 compliant.
Below is an example where devices are defined manually.
<fencedevices>
 <fencedevice agent="fence_scsi" name="scsi_dev"
  devices="/dev/sda, /dev/sdb, /dev/sdc"/>
</fencedevices>
In this example, there are three devices defined to be used with fence_scsi (/dev/sda, /dev/sdb, /dev/sdc).
It is important to note that specifying devices by device name (eg. /dev/sda) can be problematic and should generally be avoided. The reason for this is that a specific device may be named differently on other node. For example, the device named "/dev/sdb" on one node may be named "/dev/sdc" on another node. To avoid this problem, it is recommended that devices be listed by SCSI ID. This can be accomplished by specifying a device by way of its "/dev/disk/by-id/..." path.
To have the cluster automatically discover the devices to be used with fence_scsi, simply do not use the "devices" parameter. In the absence of this parameter, the fence_scsi agent will attempt to automatically discover cluster storage devices by querying clvmd for a list of devices that belong to cluster volumes. All devices that within cluster volumes will be used, and each must be SPC-3 compliant.

 APTPL

The fence_scsi agent can be configured to optionally use the APTPL flag (Activate Persist Through Power Loss) option. This feature requires that the storage supports APTPL. To enable this feature, simply set the "aptpl" parameter in the fencedevice section of the cluster.conf file.
Below is an example where the APTPL option is enabled.
<fencedevices>
 <fencedevice agent="fence_scsi" name="scsi_dev" aptpl="1"/>
</fencedevices>

 Logging

The fence_scsi agent can be configured to optionally write detailed logging information to a specific file. To enable logging to file, simply set the "logfile" parameter in the fencedevice section of the cluster.conf file.
Below is an example where logging to a file is enabled.
<fencedevices>
 <fencedevice agent="fence_scsi" name="scsi_dev" logfile="/tmp/fence_scsi.log"/>
</fencedevices>

5.6 Example

Below is a sample configuration (cluster.conf) for a cluster that uses SCSI persistent reservations as its fence method.
<?xml version="1.0"?>
<cluster config_version="1" name="my_cluster">
<cman expected_votes="1" cluster_id="1"/>
<fence_daemon post_fail_delay="0" post_join_delay="30"/>
<clusternodes>
 <clusternode name="node-01" votes="1" nodeid="1">
  <fence>
   <method name="scsi">
   <device name="scsi_dev" key="1"/>
  </method>
 </fence>
 <unfence>
  <device name="scsi_dev" key="1" action="on"/>
 </unfence>
 </clusternode>
 <clusternode name="node-02" votes="1" nodeid="2">
  <fence>
   <method name="scsi">
   <device name="scsi_dev" key="2"/>
   </method>
  </fence>
  <unfence>
   <device name="scsi_dev" key="2" action="on"/>
  </unfence>
 </clusternode>
 <clusternode name="node-03" votes="1" nodeid="3">
  <fence>
   <method name="scsi">
   <device name="scsi_dev" key="3"/>
   </method>
  </fence>
  <unfence>
   <device name="scsi_dev" key="3" action="on"/>
  </unfence>
  </clusternode>
 </clusternodes>
<fencedevices>
 <fencedevice agent="fence_scsi" name="scsi_dev" aptpl="1"
  devices="/dev/sda, /dev/sdb, /dev/sdc"
  logfile="/tmp/fence_scsi.log"/>
</fencedevices>
</cluster>