Potential Problem on XIV Storage System microcode versions 10.2.2 thru 10.2.4.a

Potential Problem on XIV Storage System ranging microcode versions from 10.2.2 thru 10.2.4.a that can be caused by changing system time via Network Time Protocol (NTP) or when changing the clock via XCLI

When the system time is changed to more than ~500 years ahead in the future, the Manager Node will get stuck. It will stop handling xcli operations, but more severely – it will not be able to detect any failure in the system. Once such a failure occurs, all hosts will lose access to the system and IBM support needs to be contacted immediately.

In case of a Manager Node impact, the system will continue to serve I/O’s but in case of a subsequent data component failure (either a disk or a module) the system might not properly identify the failure and therefore will not initiate a rebuild process and possibly cause hosts to lose access to data. Other symptoms may include, but not limited to: the XIV will not be able to properly detect loss of input AC (building power outage) and therefore shutdown while ensuring all writes are committed to disk; the customer may not be able perform any operations on the XIV, including GUI updates or inquiries.

If the machine is reporting to the XIV service center then XIV will receive the proper events that will notify us regarding this issue and we will be able to contact the customer and verify/fix this state.

 

Environment

Affected versions Environment

  • 10.2.2
  • 10.2.2.a
  • 10.2.4
  • 10.2.4.a

Resolving the problem

Mitigation

  • Remove NTP server configuration from the XIV to avoid getting into this situation.
  • Do not perform a manual change of the machine time to a date 500 years or more in the future (this would only happen by an error).

Fix
Fix is included in version 10.2.4.b version that is planned to be released in Q2 2011.

  • 10.2.4.b version will disallow setting of invalid dates
  • Year must be between 2000 and 2030,
  • 10.2.4.b will have error handling messages and more debug information to better manage this situation.
  • NEW_TIME_CHANGE_IS_INAVLID
    This event will be raised when an attempt to set the time is blocked because time is invalid. (year not between 2000 to 2030)
  • SETTING_NEW_TIME
    This event is raised every time an attempt to set a ‘complete’ time, not a delta from the last time setting. (when delta >TIME_UPDATE_MINIMUM_DIFF)
  • Both events are limited to one in an hour.

 

https://www-304.ibm.com/support/docview.wss?mynp=OCSTJTAG&mync=E&uid=ssg1S1003838&myns=s028

Advertisements

, , ,

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: