ONTAP Discussions

single-path iSCSI setup incurs downtime during HA node upgrade?

acna
1,843 Views

Environment:

ONTAP 9.1 on an HA pair

Windows Server 2008 R2 (no MPIO installed)

 

Hi,

I have an iSCSI LUN that's accesbile via an ifgrp on a node. When I upgrade the node in the HA pair, doesn't the interface fail over to other node therefore there is no downtime? In other words, even though I don't have MPIO set up to access both nodes, the iSCSI connetivity should not go down during the failover and giveback?

 

Thanks,

1 ACCEPTED SOLUTION

GidonMarcus
1,815 Views

Hi

 

you should see in the following output what the LIF failover policy

network interface show -fields failover-group,failover-policy

if the policy allow the LIF to moves to another port in the group, and the group has other eligible members, the LIF should become available on the other node.

 

The move is pretty fast. but also require the Ethernet Switches and routers to update their MAC tables. (take 1-2 sec on modern ones).

Now it's up to the host the try to re-connect and wait before it's completely times-out, and the software to also not give up for that time.

 

Without MPIO Windows will try 8 times the following value in seconds before it retires the disk "HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk\timeoutvalue"

https://blogs.msdn.microsoft.com/san/2011/09/01/the-windows-disk-timeout-value-less-is-better/

 

how long will it takes on your's?, hard to know. and also depends on the type of failover. but i guess just reduce this regkey to be less than  half the timeout your app will tolerate, that's will allow it to try at least twice.

 

Gidi

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK

View solution in original post

1 REPLY 1

GidonMarcus
1,816 Views

Hi

 

you should see in the following output what the LIF failover policy

network interface show -fields failover-group,failover-policy

if the policy allow the LIF to moves to another port in the group, and the group has other eligible members, the LIF should become available on the other node.

 

The move is pretty fast. but also require the Ethernet Switches and routers to update their MAC tables. (take 1-2 sec on modern ones).

Now it's up to the host the try to re-connect and wait before it's completely times-out, and the software to also not give up for that time.

 

Without MPIO Windows will try 8 times the following value in seconds before it retires the disk "HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk\timeoutvalue"

https://blogs.msdn.microsoft.com/san/2011/09/01/the-windows-disk-timeout-value-less-is-better/

 

how long will it takes on your's?, hard to know. and also depends on the type of failover. but i guess just reduce this regkey to be less than  half the timeout your app will tolerate, that's will allow it to try at least twice.

 

Gidi

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK
Public