Subscribe

Takeover Issues with 3240 HA Pair

Hi the last two times we have completed takeovers on one of our controllers of our busiest filers we have issues with vmware Linux guests.

Here is a high level system configuration

3240 controllers in HA Pair

4 SAS disk shelves (600GB)

2 SATA disk shelves (2TB)

2 stacks (1 for SAS and 1 for SATA)

2 SAS aggregates (1 per controller)

2 SATA aggregates (1 per controller)

The issues are usually with VMware datastores during the takeover. These datastores are 8Gb FC. What happens is when we perform a takeover the linux guests in the datastores put there root file systems into Read Only mode. I assume this is due to latency which does get rather high and Linux attempts to protect itself. The Windows guests recover on giveback.

I am not sure why this happens and if it is purely due to latency or not.

Any of the experts here have any ideas of what the potential issue or issues could be?

Thanks

Re: Takeover Issues with 3240 HA Pair

Did you set disk io timeout to 190s in VMs? We find Linux is a lot more sensitive to this.

Re: Takeover Issues with 3240 HA Pair

Ok I see this KB article now for ESX 4 and higher 180 seconds is recommended. I thought VMware tools did this already. I know it sets Windows to 60 seconds but I guess Linux needs longer to respond. I will look into this further. More information on the ESX KB here is the link.

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1009465&sliceId=1&docTypeID=DT_KB_1_1&dialogID=122102300&sta...

Re: Takeover Issues with 3240 HA Pair

This is a late reply but I just wanted to share a KB article that documents this:

https://kb.netapp.com/support/index?page=content&id=2010823

If your issue persists, please contact NetApp Technical Support.

Regards,

Christine