I'm testing failover on a new Windows Server 2008 R2 Active/Passive failover cluster. Part of my testing involves simulating a loss of SAN storage connectivity. Each of my cluster nodes has 2x FC SAN connections and they are utilizing MPIO with NetApp's DSM 3.5. When I pull both fibre connections on the currently-active node in the cluster, it takes ~2 minutes 15 seconds for Windows to see that the disk I/O path is down and then trigger a failover of the resources. In the System event log I can see that the DSM sees the connectivity loss immediately but Windows (event source "Disk") doesn't see that connectivity loss for a full 2 minutes later. It doesn't seem to matter whether there is active disk I/O going on or not.
I've been able to workaround the issue by messing with the PDORemovePeriod registry parameter (HKLM\SYSTEM\CurrentControlSet\Services\ontapdsm\Parameters\PDORemovePeriod). I set the value to 30 seconds and failover occurred ~33 seconds after disconnecting the storage paths. Per documentation, ONTAP DSM sets this value to 130 seconds, which explains the 2 minute failover time experienced originally. Why is this value set so high and what are the ramifications of my changing this setting to something more reasonable?