Copyright © 2010 Caringo, Inc.
All rights reserved 44
Chapter 7. Managing Volumes
In normal operations, there are no required actions on the part of the administrator in order to
manage DX Storage volumes. However, there are some special cases that occur if a volume or a
node has a problem or if the administrator wishes to perform hardware maintenance on a node.
7.1. Volume Expiration
The DX Storage cluster is designed to automatically adapt in the event of a failed volume (hard disk)
or a failed node. Every volume in the DX Storage cluster is checked during the startup of a node. If
a volume has been disconnected from the cluster for more than 14 days, it is considered "stale" and
its contents are not used unless an administrator specifically overrides this behavior.
Although the 14-day time limit applies to volumes, if a node is shut down for more than 14 days, all
of its volumes are considered stale and they are not used. After 14 days, an administrator can force
a volume to be remounted by modifying the volume specification and adding the :k (keep) policy
option. See Section 6.5, “Managing Volumes” for details about how this is done.
When a volume that is older than 14 days is forced to return to service, care must be taken because
you might resurrect content that had been explicitly deleted by clients. This is not a problem for
content that was deleted by automatic lifepoint policies because the content is discovered and
deleted by DX Storage’s continuous health processor.
7.2. Movement Between Nodes
Physical volumes can be moved between nodes if this becomes necessary due to hardware failures
or other constraints as determined by an administrator.
When a volume goes off-line due to a failure of the volume, the failure of the node, or the shutdown
of a node, the cluster will immediately begin the process of ensuring that the correct number of
replicas exists for all the streams in the cluster. If a volume or node returns to the cluster during this
operation and prior to the 14-day time limit, the checks will continue, but the replicas on the returned
volumes will be considered when validating the stream constraints.
When adding volumes, either new or those from another machine, to a node, care should
be taken to ensure that the node has sufficient RAM to handle the additional storage. If
the RAM is not sufficient, the node might be unable to mount some of the volumes.
Volumes may also be moved to nodes that are in a different cluster. When this is done, the streams
on that volume become part of the new cluster and they will be checked for the correct constraints
within the context of the new cluster.
7.3. Physical Errors
In order to provide for autonomous operations, a DX Storage node watches for physical errors
when reading and writing to its volumes. If the node receives any physical errors from a volume, the
volume is immediately retired and the node will avoid any further requests to the failed device.
Due to the sophistication of modern disk storage devices and interfaces, there are many error
detection steps, bad sector re-mapping, and retry attempts that are performed by the underlying disk
system. If a physical error propagates up to the DX Storage software level, there is little chance that
a deterministic set of steps can be performed to work around the failure. Additionally, there is no