Storwize V7000 Systems Running V184.108.40.206 – V220.127.116.11 Code May Shut Down Unexpectedly During Normal Operation, Resulting in a Loss of Host Access and Potential Loss of Fast-Write Cache Data
Storwize V7000 units running code levels between V18.104.22.168 and V22.214.171.124 are exposed to an issue that can result in both node canisters shutting down simultaneously and unexpectedly during normal operation.
An issue exists in the V126.96.36.199 – V188.8.131.52 code that can result in both node canisters abruptly shutting down during normal operation, resulting in a loss of hardened configuration metadata on these nodes and the requirement to perform a manual cluster recovery process to restore the configuration. This recovery process may take up to several hours to complete.
Additionally, any host I/O data that was resident in the fast-write cache at the time of failure will be unrecoverable.
A workaround was introduced in the V184.108.40.206 PTF release which, although not eliminating this issue, prevents the shutdown event on one node canister from propagating to the other node canister. This is intended to prevent a double-node shutdown from occurring, resulting in any loss of host access to data, and avoiding the need for a cluster recovery process to be performed. This workaround was further improved in the V220.127.116.11 PTF release to ensure that affected nodes will recover automatically.
If a single node shutdown event does occur when running V18.104.22.168, this node will automatically recover and resume normal operation without requiring any manual intervention.
IBM Development is continuing to work on a complete fix for this issue, to be released in a future PTF, however customers should upgrade to V22.214.171.124 to avoid an outage.
Please visit the following URL to download the V126.96.36.199 code: