Windows iSCSI IO downtime comparision between LIF down and SFO

Dong_Chen · ‎2016-03-25

I have a question of the Windows iSCSI IO downtime.

Host: Windows Server 2008

Filer: cDOT 831

When I take the LIF down, the IO downtime in the Windows host is 15s, which defined by the LinkDownTime regedit option.

When I takeover the node which the LUN locates, the IO downtime is less than 1s. Is it the normal behavior? Why the SFO's downtime is so short.

Thanks for the help!

aborzenkov · ‎2016-03-25

Taking LIF down is not related to LinkDownTime - interface on host is still up, there is no link problem. Timeout most like corresponds to IO request timeout after which MPIO on host retries on different path.

In case of SFO there is short pause when access to aggregate is switched to partner, both (all) paths are alive and function correctly.

Dong_Chen · ‎2016-03-27

Thanks for the kind help!

When the customer has the LIF down test, that means taking down the active LIF. The client IO was interuppted for 15s. And the IO downtime will change by the modification of client LinkDownTime option. I am not sure what's the expected behavior when taking down the active LIFs.

And when SFO happens on the local node, the iSCSI LIFs will take down and the aggr will relocate to the partner. per the test, there is almost no IO downtime(less the 1s) in the client observation. Is this an expected behavior?

aborzenkov · ‎2016-03-27

Ah, OK, it is just misleading parameter name. Windows iSCSI initiator LinkDownTime in reality defines IO request timeout, so it is exactly as I explained - when path becomes unavailable, host waits for configured timeout before switching over to another path. When you perform SFO all paths continue to be available, so it is just time required to flip disk ownership between controllers.

Dong_Chen · ‎2016-03-28

So when the node fails, the iSCSI LIFs will be taken down. Why the clients won't wait for the configured timeout before switching to another path?

aborzenkov · ‎2016-03-28

Node takeover also moves LIFs from node to another node, so LIF is still available.

Dong_Chen · ‎2016-03-28

iSCSI LIFs will not migrate when SFO, so in the two different test scenario,(assume in 2node-cDOT8.3, every node has 2 iSCSI LIFs)

1) take down the optimized LIFs; -> client IO will interrupte for 15s;

2) take down the local node; -> client IO will interrupte for less than 1s;

both the two operations will cause the optimized LIFs become unavailable, client IO will try to switch to the patner path. Why are the behaviors different in the above two scenario.

aborzenkov · ‎2016-03-28

Yes, you are right, sorry for mixing up 7-Mode and C-Mode. Well, I think filer may notify host that preferred path changed so host simply continues IO over remaining LIF. But yes, I would be interested in final answer too.

Dong_Chen · ‎2016-03-28

Yes, I guess so...But not sure how the filer notify to the host. Thanks for the kind answers to my question