Storwize V7000 Systems Running V188.8.131.52 – V184.108.40.206 Code May Shut Down Unexpectedly During Normal Operation, Resulting in a Loss of Host Access and Potential Loss of Fast-Write Cache Data
Storwize V7000 units running code levels between V220.127.116.11 and V18.104.22.168 are exposed to an issue that can result in both node canisters shutting down simultaneously and unexpectedly during normal operation.
An issue exists in the V22.214.171.124 – V126.96.36.199 code that can result in both node canisters abruptly shutting down during normal operation, resulting in a loss of hardened configuration metadata on these nodes and the requirement to perform a manual cluster recovery process to restore the configuration. This recovery process may take up to several hours to complete.
Additionally, any host I/O data that was resident in the fast-write cache at the time of failure will be unrecoverable.
A workaround was introduced in the V188.8.131.52 PTF release which, although not eliminating this issue, prevents the shutdown event on one node canister from propagating to the other node canister. This is intended to prevent a double-node shutdown from occurring, resulting in any loss of host access to data, and avoiding the need for a cluster recovery process to be performed. This workaround was further improved in the V184.108.40.206 PTF release to ensure that affected nodes will recover automatically.
If a single node shutdown event does occur when running V220.127.116.11, this node will automatically recover and resume normal operation without requiring any manual intervention.
IBM Development is continuing to work on a complete fix for this issue, to be released in a future PTF, however customers should upgrade to V18.104.22.168 to avoid an outage.
Please visit the following URL to download the V22.214.171.124 code: