OnTap 8.3.1P1 - MetroCluster - Raid Scrub and super high latency

mclawler75 · ‎2016-03-08

I have two C-Dot FIlers at my location, one a 4 node MetroCluster, and one a standard C-Dot FIler. We don't see issues with the Raid Scrub on the Standard C-Dot FIler (a two node 8060), but it beats the crap out of our 4 node MetroCluster (FAS8040's). Is anyone else seeing this, the standard window for the Raid Scrub is 1am - 5am, if you look in the performance manager you should see a spike during those hours.

Hopeing I'm not the only one getting hit with this, to take care of the problem while Support digs into it we have simply killed the Raid Scrub options, not ideal I know, but we were leaving trucks at the shipping dock waiting for the Filer to serve up data.

ttran · ‎2016-03-09

Hi,

Would you happen to have "thorough scrub" enabled for any aggregates on your 4-node Metrocluster Cluster as this will definitely increase the length and system utilization for a scrub? In addition, the typical scrub will utilize more system resources in a Metrocluster configuration compared to a non-metrocluster configuration due to the remote mirror.

From the cluster shell you can run the following command to query if you have thorough scrub enabled for any aggregate(s):

> storage aggregate show -aggregate * -thorough-scrub on

If the feature is not enabled then you see the following output:

"There are no entries matching your query"

Team NetApp

mclawler75 · ‎2016-03-09

No, that option is not set.

Its odd that the MC would see such a huge spike, I go from .05 (roughly) latency, to 10.x latency during this Raid Scrub.

On a standard c-dot filer I don't even see the job running other than its listed in the logs.

mclawler75 · ‎2016-03-22

So, after working with the critical care team our issue has been tracked back to us over saturating the ATTO bridges. The solution is to add another 4 more ATTO bridges to the mix.