Copyright © 2010 Caringo, Inc.
All rights reserved 49
Appendix B. Using SNMP with DX Storage
This appendix explains how to integrate a DX Storage cluster into an enterprise SNMP monitoring
infrastructure. The DX Storage SNMP agent implementation provides the mechanism through which
to monitor the health of cluster nodes, collect usage data, and control node actions.
B.1. SNMP Management Information Base (MIB) Reference
For documentation for the DX Storage Object Identifiers (OIDs) referred to in this chapter, see the
• If you boot from a CSN, an aggregate MIB for the entire cluster is available in /usr/share/
• If you do not boot from a CSN, the MIB is located in the root directory of the DX Storage software
B.2. Managing DX Storage Nodes
DX Storage cluster nodes are controlled through the SNMP action commands. These commands
provide a mechanism through which nodes and volumes within nodes can be taken down for service
or retired from a DX Storage cluster.
B.2.1. Shutdown Action for Nodes
In order to gracefully shutdown a DX Storage node, the string “shutdown” is written to the
castorShutdownAction OID. Similarly, writing the string “reboot” to this OID will cause a DX Storage
node to reboot.
Upon receipt of a shutdown or reboot value, the node will initiate a graceful stop by unmounting all
of its volumes and removing itself from the cluster. For a shutdown, the node will be powered off it
the hardware supports this. For a reboot, the node will reboot to machine, re-read the node and/or
cluster configuration files and startup DX Storage.
A graceful node stop is necessary in order to reboot quickly. If a node stops ungracefully, it will be
required to perform consistency checks on all its volumes before it can rejoin the cluster.
Before shutting down or rebooting, a node’s status page or the SNMP castorErrTable OID should be
checked for critical error messages. Any critical messages logged there will be cleared upon reboot.
B.2.2. Retire Action for Nodes and Volumes
The retire action is used to permanently remove a node or a volume within a node from the cluster.
Retire is intended for retiring old hardware or pre-emptively pushing content away from a volume
that has seen an IO error. Retired volumes and nodes are visible in the Admin Console until after
the cluster has been rebooted.
Retire is not tuned for fast completion. Completing a retire action requires at least three
health processor cycles.