I've been digging around trying to find an answer but haven't really found anything definitive. I think I know what the answer is, but wanted to get real world feedback and possible best practices.
I'm looking at a scenario with a single filer (dual head) that is cabled fully redundant across two separate FC switches. Head one 0a goes to switch 1, 0b goes to switch 2. Head two 0a goes to switch 2, 0b goes to switch 1. There are a few VMware hosts, each with a vmhba into each switch. I'm doing quite a bit of cleanup in the back of the rack and need to perform the following:
1. Replace the power cable on FC switch 2.
2. Replace all the FC cables going into the filer heads.
I need to do all this without downtime. Since it is fully redundant, I should be able to do one thing at a time without any interruptions.
I'm hoping that the answer is going to be that FC is an intelligent enough protocol that it can handle this without any provisions and no risk of data corruption.
I'm expecting the answer to be something like remove the paths in VMware, shutdown FC adapters in the filer, then perform maintenace. While not difficult, just a huge pain in the backside when looking at multiple datastores, each with 4 paths, presented on multiple hosts.
Let me know if anyone has any experience with this they wouldn't mind sharing.
Based on feedback I received on the VMware forums, I went ahead and just did it. I waited until after hours, made sure nothing critical was happening (for my own comfort) and pulled the plug on one of my fibre channel switches. VMware immediately detected the failed paths and nothing seemed to skip a beat. The biggest thing I guess is ensuring that zoning is setup correctly so you do you not take down the only path to a storage a device, but we saw no issues with this. Hopefully this helps calm your nerves if you are needing to do this in the future.
If only there was a software product that could talk to VMware, your FC switches, and storage, correlating all the mapping/masking/zoning/port connectivity/ESX RDM and datasource configuration to help you understand if you were fully redundant, and alert you to that misconfiguration before a change event 😉
If youtube is permitted - this sub-3 minute video quickly covers analyzing a path violation thrown by OCI because of a lack of redundancy