Mostly I will talking about NDU, so keep that in mind.
One of our competitor sad to a customer that in situation when in HA system one controller has fall, second will not switch in to "Write Through Mode" (which is not the best idea for the performance) and will continue "cache data" (but will have one point of failure which is Controller or NVRAM). As far as I know FAS systems not supporting "Write Through mode" and it can't be switched manually although.
They complain that after second controller will die, because of it data in NVRAM will (can) die as well. So theoretically we can have non-consistent (for a little bit, for the size of not written) data after we change (restore) the controllers. That can be very dangerous for some apps especially in SAN environment. From that point of view NetApp should not recommend "not disruptive update" (NDU) for 2 node systems, which include alternately update of each controllers.
>Some other vendors will switch to write-through if partner fails. This eliminates risk of lost acknowledged writes due to remaining controller failure. Stress point here being “acknowledged”.
In my opinion in this case NAS environment not so sensitive for such an event (at list we loose consistency for some files not whole FS). So I'm interesting mostly in SAN with such a failure.
It seams reasonable so the question is simple: how to replay for that?
Sounds like they made that up. Nvram mirrors between controllers. No writes are lost in an ha event. A partner instance of the failed controller is brought up. I guess they could say when running on one node it is at risk. But so is the competitor. And not likely they mirror their nvram like NetApp.
You miss the point. Some other vendors will switch to write-through if partner fails. This eliminates risk of lost acknowledged writes due to remaining controller failure. Stress point here being “acknowledged”. I’m not aware that NetApp have ability to do it (may be undocumented option?). And yes, those vendors do mirror write cache across controller pair, of course, to eliminate SPOF. Some are using intent log on external SSD disks which are taken over by partner.
I know at least one system which claims to redistribute cache mirroring across remaining nodes if more than two are present.
SAN or NAS does not really matter. On disk state is always crash consistent up to the latest CP. So in case of NVRAM failure on a single filer you lose some amount of uncommitted data. It is true that in file server environment impact is probably less severe than in database environment.