2012-05-21 05:12 PM - last edited on 2014-09-26 12:15 PM by allison
I have a customer with a large number of AIX servers who just experienced timeouts on 3 servers during an NDU ONTAP upgrade, and again during a non-disruptive Flash Cache card install (the same 3 servers both times).
During both times the takover/giveback took over 30 seconds. I wouldn't expect this to be a problem seeing as NetApp openly advertise that it won't take more than 180 seconds and these servers had the Host Utilities for AIX installed/configured.
Turns out that IBM identified a few MPIO patches missing on these servers which is more than likely the root cause, but it got me thinking: if a takeover/giveback can take up to 180 seconds, why do the Host Utilities set the timeout value to 30 seconds?
I've attached a screenshot which shows the output of lsattr which highlights the rw_timeout value:
The Host Utilities manual also states that this is the correct value: http://support.netapp.com/knowledge/docs/hba/aix/relaixhu50/pdfs/host_set.pdf (page 7)
All of the other servers continue to run normally during the takeover/giveback, but I'm now interested if anyone is able to explain why the timeout is set to 30 seconds but a takeover can take 180 seconds?
2012-08-01 10:47 AM
I'm curious about this as well. Most other FC array providers suggest at least 60 seconds, and as John points out above it can take as much as 180 seconds to complete a failover. So, why the 30 second setting? BTW, 30 seconds is also the default in AIX these days.