Potential Problem on XIV Storage System ranging microcode versions from 10.2.2 thru 10.2.4.a that can be caused by changing system time via Network Time Protocol (NTP) or when changing the clock via XCLI
When the system time is changed to more than ~500 years ahead in the future, the Manager Node will get stuck. It will stop handling xcli operations, but more severely – it will not be able to detect any failure in the system. Once such a failure occurs, all hosts will lose access to the system and IBM support needs to be contacted immediately.
In case of a Manager Node impact, the system will continue to serve I/O’s but in case of a subsequent data component failure (either a disk or a module) the system might not properly identify the failure and therefore will not initiate a rebuild process and possibly cause hosts to lose access to data. Other symptoms may include, but not limited to: the XIV will not be able to properly detect loss of input AC (building power outage) and therefore shutdown while ensuring all writes are committed to disk; the customer may not be able perform any operations on the XIV, including GUI updates or inquiries.
If the machine is reporting to the XIV service center then XIV will receive the proper events that will notify us regarding this issue and we will be able to contact the customer and verify/fix this state.
Affected versions Environment
Resolving the problem
- Remove NTP server configuration from the XIV to avoid getting into this situation.
- Do not perform a manual change of the machine time to a date 500 years or more in the future (this would only happen by an error).
Fix is included in version 10.2.4.b version that is planned to be released in Q2 2011.
- 10.2.4.b version will disallow setting of invalid dates
- Year must be between 2000 and 2030,
- 10.2.4.b will have error handling messages and more debug information to better manage this situation.
This event will be raised when an attempt to set the time is blocked because time is invalid. (year not between 2000 to 2030)
This event is raised every time an attempt to set a ‘complete’ time, not a delta from the last time setting. (when delta >TIME_UPDATE_MINIMUM_DIFF)
- Both events are limited to one in an hour.