Consistency: NVRAM, NDU

D_BEREZENKO · ‎2013-10-20

Hi.

Mostly I will talking about NDU, so keep that in mind.

One of our competitor sad to a customer that in situation when in HA system one controller has fall, second will not switch in to "Write Through Mode" (which is not the best idea for the performance) and will continue "cache data" (but will have one point of failure which is Controller or NVRAM). As far as I know FAS systems not supporting "Write Through mode" and it can't be switched manually although.

They complain that after second controller will die, because of it data in NVRAM will (can) die as well. So theoretically we can have non-consistent (for a little bit, for the size of not written) data after we change (restore) the controllers. That can be very dangerous for some apps especially in SAN environment. From that point of view NetApp should not recommend "not disruptive update" (NDU) for 2 node systems, which include alternately update of each controllers.

>Some other vendors will switch to write-through if partner fails. This eliminates risk of lost acknowledged writes due to remaining controller failure. Stress point here being “acknowledged”.

In my opinion in this case NAS environment not so sensitive for such an event (at list we loose consistency for some files not whole FS). So I'm interesting mostly in SAN with such a failure.

It seams reasonable so the question is simple: how to replay for that?

Thanks.

scottgelb · ‎2013-10-20

Sounds like they made that up. Nvram mirrors between controllers. No writes are lost in an ha event. A partner instance of the failed controller is brought up. I guess they could say when running on one node it is at risk. But so is the competitor. And not likely they mirror their nvram like NetApp.

aborzenkov · ‎2013-10-20

You miss the point. Some other vendors will switch to write-through if partner fails. This eliminates risk of lost acknowledged writes due to remaining controller failure. Stress point here being “acknowledged”. I’m not aware that NetApp have ability to do it (may be undocumented option?). And yes, those vendors do mirror write cache across controller pair, of course, to eliminate SPOF. Some are using intent log on external SSD disks which are taken over by partner.

I know at least one system which claims to redistribute cache mirroring across remaining nodes if more than two are present.

scottgelb · ‎2013-10-20

One vendor claims that and we found out in reality they didn't write through. And for San they mirrored nvram but not their NAS.

Sent from my iPhone 5

D_BEREZENKO · ‎2013-10-21

As far as I know E-Series can switch to write-through mode at the event of one controller failure. I'm wondering about HP 3Par.

In my opinion in this case NAS environment not so sensitive for such an event (at list we loose consistency for some files not whole FS). So I'm interesting mostly in SAN with such a failure.

aborzenkov · ‎2013-10-21

SAN or NAS does not really matter. On disk state is always crash consistent up to the latest CP. So in case of NVRAM failure on a single filer you lose some amount of uncommitted data. It is true that in file server environment impact is probably less severe than in database environment.

D_BEREZENKO · ‎2013-10-21

You've got the idea once again. In our reality most business critical DB installations are SAN.

It is strange that NetApp tech support do not have answer for that.

scottgelb · ‎2013-10-21

For oracle do you guys see more San than NAS? We see mostly NFS for database and VMware too.

Sent from my iPhone 5

D_BEREZENKO · ‎2013-10-21

Here in CIS (post USSR) I see more SAN for Oracle.

D_BEREZENKO · ‎2013-10-20

You've got the point. NetApp tech support saying that FAS systems do not have write-through mode at all.