ONTAP Discussions

single-path iSCSI setup incurs downtime during HA node upgrade?

acna

Environment:

ONTAP 9.1 on an HA pair

Windows Server 2008 R2 (no MPIO installed)

 

Hi,

I have an iSCSI LUN that's accesbile via an ifgrp on a node. When I upgrade the node in the HA pair, doesn't the interface fail over to other node therefore there is no downtime? In other words, even though I don't have MPIO set up to access both nodes, the iSCSI connetivity should not go down during the failover and giveback?

 

Thanks,

1 ACCEPTED SOLUTION

GidonMarcus

Hi

 

you should see in the following output what the LIF failover policy

network interface show -fields failover-group,failover-policy

if the policy allow the LIF to moves to another port in the group, and the group has other eligible members, the LIF should become available on the other node.

 

The move is pretty fast. but also require the Ethernet Switches and routers to update their MAC tables. (take 1-2 sec on modern ones).

Now it's up to the host the try to re-connect and wait before it's completely times-out, and the software to also not give up for that time.

 

Without MPIO Windows will try 8 times the following value in seconds before it retires the disk "HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk\timeoutvalue"

https://blogs.msdn.microsoft.com/san/2011/09/01/the-windows-disk-timeout-value-less-is-better/

 

how long will it takes on your's?, hard to know. and also depends on the type of failover. but i guess just reduce this regkey to be less than  half the timeout your app will tolerate, that's will allow it to try at least twice.

 

Gidi

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK

View solution in original post

1 REPLY 1

GidonMarcus

Hi

 

you should see in the following output what the LIF failover policy

network interface show -fields failover-group,failover-policy

if the policy allow the LIF to moves to another port in the group, and the group has other eligible members, the LIF should become available on the other node.

 

The move is pretty fast. but also require the Ethernet Switches and routers to update their MAC tables. (take 1-2 sec on modern ones).

Now it's up to the host the try to re-connect and wait before it's completely times-out, and the software to also not give up for that time.

 

Without MPIO Windows will try 8 times the following value in seconds before it retires the disk "HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk\timeoutvalue"

https://blogs.msdn.microsoft.com/san/2011/09/01/the-windows-disk-timeout-value-less-is-better/

 

how long will it takes on your's?, hard to know. and also depends on the type of failover. but i guess just reduce this regkey to be less than  half the timeout your app will tolerate, that's will allow it to try at least twice.

 

Gidi

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK

View solution in original post

Announcements
NetApp on Discord Image

We're on Discord, are you?

Live Chat, Watch Parties, and More!

Explore Banner

Meet Explore, NetApp’s digital sales platform

Engage digitally throughout the sales process, from product discovery to configuration, and handle all your post-purchase needs.

NetApp Insights to Action
I2A Banner
Public