Subscribe

I need to reduce the time period before takeover, as much as poosible

I need to reduce the time period before takeover, as much as poosible.

I did change the options

options cf.takeover.detection.seconds to 10 seconds.

But still its taking 180 seconds to failover which is not accepted by the customer.

The system is FAS3140A.

Any Suggestion

Thanks,

Re: I need to reduce the time period before takeover, as much as poosible

Hi and welcome to the Communities!

180 sec for the cluster failover sounds way too long. What ONTAP version are they on? Have you seen this thread: http://communities.netapp.com/message/11677#11677?

Regards,

Radek

Re: I need to reduce the time period before takeover, as much as poosible

Hi,

The Data ontap is 7.3.4 , yes I did look at the below thread but will

not help.

Still waiting for help, we need to reduce this 180 seconds.

Thanks,

Amin Abu-Dosh

Re: I need to reduce the time period before takeover, as much as poosible

Where are you seeing that it's taking 180 seconds? ie. are you seeing it within logs? If so please provide them... the logs that is. Failover to the cf partner should be near instantaneous. Failback is another story since the controller has to boot before it can takeover services.

I'm also curious to hear what transport protocols you're using.  With FC you should see no downtime whatsoever since both boxes should be configured in "single_image" mode. NFS/CIFS and iSCSI will be impacted and require use of the NetApp Host Utilities kit which will update the timeout settings to 120 seconds for physical hosts.  For virtual hosts a seperate host utilitities kit is bundled with the ESX host utilities kit which updates each hosts disk timeout settings to 180 seconds.

Re: I need to reduce the time period before takeover, as much as poosible

Hi,

Can you ask the customer what the impact of 180 seconds is, other than it might seem long? are they experiencing loss of writes/service?

Also, if you want to get shorter failover time you could turn of snapmirror, ndmp, cifs, nfs and all other protocols to lessen workload so that

failover is faster. This should be part of a controlled failvoer anyways to ensure no loss of data.

Have you enabled hardware assisted failover?

https://kb.netapp.com/support/index?page=content&id=1010145&actp=search&viewlocale=en_US&searchid=1299526213692

Read the note on the bottom to make sure you can do it.

Eric

Re: I need to reduce the time period before takeover, as much as poosible

Hi,

All the storgae are used almost for CIFS shares.

The customer applications are hang becuase of long failover period.

Is there any options to reduce the failover period when cifs are there.

Yes I did configure hardware assisted. But still no impact in the

failover period

Thanks,

Re: I need to reduce the time period before takeover, as much as poosible

Hi,

see the below log.

alm3140b> Sun Mar 6 16:51:02 AST [alm3140b:

cf.fsm.nfo.acceptTakeoverReq:warning]: Negotiated failover: accepting

takeover request by partner, reason: operator initiated cf takeover.

Asking partner to shutdown gracefully; will takeover in at most 180

seconds.

the protcol used is CIFS , all the stoarge are used for shares only.

Thanks,

Re: I need to reduce the time period before takeover, as much as poosible

All the storgae are used almost for CIFS shares.

Okay, so are the apps in question relying on CIFS?

CIFS is session-based, so regardless of the fail-over time, all sessions are terminated during fail-over & nothing can be done about it (as far as current version of CIFS is concerned)

Re: I need to reduce the time period before takeover, as much as poosible

I'm wondering if the questions Eric asked were answered?  Where are you seeing 180 seconds?  When you initiate a takover, there is a standard "hard coded" message that says takeover will happen within 180 seconds.  It will not reflect the values that you've set.  Also, how the host reacts depends on a couple of factors.  If the right host utilities are used, if there are any specific corproate System settings in your environment, and more importantly if your cluster has been correctly configured to assume the partner IP's in case of a failover.

The system logs on the primary and partner systems would identify how long the takover actually took.  Out of curiosity have you opened a case for this issue?  That would be the best way to get a more detailed response to your questions.  The Global support folks could assess your environmental settings from the most recent Autosupport, or you could generate a user-initiated ASUP when a new case is opened.

Right now I'm not certain there is enough information to look at (on this discussion) to point you in the right direction.

Re: I need to reduce the time period before takeover, as much as poosible

In addition to the session-drop also remember the network infrastructure.

With a completed takeover, IP and MAC address now appear on a different switch port.

The network infrastructure needs to deal with it as well.

regards, Niels