During periods of heavy activity, we are seeing LogFull CPs intermixed with HighWaterOps CPs at the rate of 1 or 2 every 100 secs. A coworker says the LogFull CPs will cause the filer to return an IO Error and drop writes. I claim it will merely slow the filer down (perhaps a lot) since the write will just block (perhaps for a second or two, even.)
All our activity is NFS3 over TCP and all mounts are hard, so all writes should hang unless the app interrupts the IO. (Will the filer interrupt the IO? I am a NetApp newbie; I expect a hard write to hang forever unless the app or operator want to interrupt it.)
How can LogFull CPs be eliminated? (buy more hardware, my guess.)
Is there a document somewhere that explains what exactly is happening when a LogFull CP happens?
I've attached a pretty Cacti graph that shows the CP rates at a time of heavy activity. I think the rates are "per second" so "100m" (100 msec) means "every 10 sec", etc.
Thanks in advance, -w