Hi the last two times we have completed takeovers on one of our controllers of our busiest filers we have issues with vmware Linux guests.
Here is a high level system configuration
3240 controllers in HA Pair
4 SAS disk shelves (600GB)
2 SATA disk shelves (2TB)
2 stacks (1 for SAS and 1 for SATA)
2 SAS aggregates (1 per controller)
2 SATA aggregates (1 per controller)
The issues are usually with VMware datastores during the takeover. These datastores are 8Gb FC. What happens is when we perform a takeover the linux guests in the datastores put there root file systems into Read Only mode. I assume this is due to latency which does get rather high and Linux attempts to protect itself. The Windows guests recover on giveback.
I am not sure why this happens and if it is purely due to latency or not.
Any of the experts here have any ideas of what the potential issue or issues could be?
Ok I see this KB article now for ESX 4 and higher 180 seconds is recommended. I thought VMware tools did this already. I know it sets Windows to 60 seconds but I guess Linux needs longer to respond. I will look into this further. More information on the ESX KB here is the link.