ONTAP Discussions

options cf.takeover.change_fsid

crousseaux
4,210 Views

I would like a clarification on the options cf.takeover.change_fsid and its impact.

It seems appropriate in an environment of MetroCluster to Off. But the Warning on any possibility of data loss is worrying.


below one extracted from the doc.
This information is also included in the TR 3788 (best practice ESX) available at NetApp and VMware

Is it possible to leave ON
and pass it off just before making the "see forcetakeover-d"?

1) What risk does the Fablic MetroCluster?
2) In which case choose or not to disable this option?
3) Is it just a precaution to force the client to make a check of the FS before re-mounting ? For VMWare in particular, the change of fsid is surmountable, but it is a significant constraint.

thank you for your advice

--------------------------------------------

from NetApp doc.

--------------------------------------------

Disabling the change_fsid option in MetroCluster configurations In a MetroCluster configuration, you can take advantage of the change_fsid option in Data ONTAP to simplify site takeover when the cf forcetakeover -d command is used.

About this task

In a MetroCluster configuration, if a site takeover initiated by the cf forcetakeover -d command occurs, the following happens:

- Data ONTAP changes the file system IDs (FSIDs) of volumes and aggregates because ownership changes.
- Because of the FSID change, clients must remount their volumes if a takeover occurs.
- If using Logical Units (LUNs), the LUNs must also be brought back online after the takeover.

To avoid the FSID change in the case of a site takeover, you can set the change_fsid option to off (the default is on). Setting this option to off has the following results if a site takeover is initiated by the cf forcetakeover -d command:

- Data ONTAP refrains from changing the FSIDs of volumes and aggregates.
- Users can continue to access their volumes after site takeover without remounting.
- LUNs remain online.

CAUTION:
If the option is set to off, any data written to the failed node that did not get written to the surviving node's NVRAM is lost. Disable the change_fsid option with great care.
Step

1.Enter the following command to disable the change_fsid option:options cf.takeover.change_fsid off
By default, the change_fsid option is enabled (set to on ).

Clarification of when data loss can occur when the change_fsid option is enabled
Ensure that you have a good understanding of when data loss can occur before you disable the change_fsid option. Disabling this option can create a seamless takeover for clients in the event of a disaster, but there is potential for data losss.

Clarification of when data loss can occur when the change_fsid option is enabled

Ensure that you have a good understanding of when data loss can occur before you disable the change_fsid option. Disabling this option can create a seamless takeover for clients in the event of a disaster, but there is potential for data losss.

If both the ISLs between the two sites in a fabric MetroCluster go down, then both the systems remain operational. However, in that scenario, client data is written only to the local plex and the plexes become unsynchronized.

If, subsequently, a disaster occurs at one site, and the cf forcetakeover -d command is issued, the remote plex which survived the disaster is not current. With the change_fsid option set to off, clients switch to the stale remote plex without interruption.

If the change_fsid option is set to on, the system changes the fsids when the cf forcetakeover -d is issued, so clients are forced to remount their volumes and can then check for the integrity of the data before proceeding.

4 REPLIES 4

joostvandrenth
4,210 Views

Good question; Ontap always mirrors disk writes in the write caches of both cluster nodes before data is destaged to disk. Whenever a cluster node goes down it is assumed data is available to both NVRAM caches to be destaged on the partner cluster node. In a metrocluster failover I cannot imagine a very sudden disaster which wipes the entire system, without having a chance to mirror the NVRAM cache.  I always thought a write by a client will always be acknowledged by the partner NVRAM, which would make the use of change_fsid off safe.

I do not see a problem with setting the option to off...

joostvandrenth
4,210 Views

Anyone have some input on this?

eric_barlier
4,210 Views

Hi,

I ve read the first bit, the descriptions of what happens if its ON or OFF. From a loss of data point of view I agree with you, it should not be a problem.

From an operational point of view there is quite a lot to be gained from having this option to OFF, services remain online, so its automatic = no downtime.

Im not sure how it works when you need to restore the cluster, is there any impact at that time? probably not.

I guess what this option offers is fully automated failover which is transparent to clients at the cost of a VERY low risk of data loss. I would turn

this off myself, depending on the application and its latency of course.

Eric

bernd_wolters
4,210 Views

Hi,

we got that discussion at the metro-recert-course. In the case that you have to break the mirror with "cf forcetakover -d" you have a DESASTER!

If one whole Datacenter is lost your major trouble is probably not the gap of non-flushed NVRAM.

If you leave it ON the luns are offlined and the FSID is newly created. VMware shows the  luns as "snapshot", because of the serial and Windows2008 doesn´t have problems at all.

From that perspective you have your "survived Pool1" and a reduced RTO and RPO.

Regards Bernd

Public