I am working with Arista (vendor of our network switches that connect the NetApp A300 to our servers) on a case where VMWare lost connectivity to the A300 storage during a failover / giveback. Arista is asking for detailed information on what is actually happening during a failover/giveback so that they can troubleshoot from their end.
But I was wondering if there is more detailed information available than that in relation to the networking side of a failover/giveback. What's actually happening 'under the covers' when a LIF fails over like timing of network requests etc.
2 REPLIES 2
The way that I understand it from a networking perspective is that ONTAP sends Gratuitous ARP (GARP) packets to the upstream switches to notify them that the LIF was migrated and tells the switches that they should update their ARP table with the proper MAC address.
Depending on the amount of LIFs that are migrating and/or the upstream switch configuration, these GARP packets could possibly be dropped and the switch's ARP table wouldn't be updated as a result. Not sure if that's actually the case with your issue, but a packet capture on both sides of the conversation (ONTAP and your Arista switches) while you do a takeover/giveback operation would provide some useful data.
I was more looking for timings of the network process during a failover/giveback: like at 00:00 sent command x. after 10 seconds, send command yyy, wait for response, if no response resent in xxx seconds etc. I figure someone @netapp would have such a process flow of their procedure?