Accepted Solution

Performance advantages from options raid.mirror_read_plex_pref alternate


I had a little discussion with a colleague this morning who had started using this setting on a stretch metrocluster at another location, but hadn't measured the performance differences. He sort of thought that it should improve things because not all the reads come from the same plex and the plexes are very closely co-located.

I have a fabric-attached metro cluster with probably 800M fibre distance and wondered if anyone had any documented experience or recommendations on the use of "alternate" vs. "local" as far as read performance goes for fabric-attached metroclusters.  There are times when I could use a bit better read performance...

Performance advantages from options raid.mirror_read_plex_pref alternate

We are actually using alternate in MetroClusters up-to 10km fibre-length, though we do definitely test for *negative* impact, before going live. So far it always was a win. I personnaly had on system, which was set to "local" and was serving a spindle-bound read intensive workload. Switching from "local" to "alternate" made the disk read values in sysstat -x instantly jump from 280MB/s to 490MB/s...

So I'd just go for it...


Performance advantages from options raid.mirror_read_plex_pref alternate

We had it set to "local" on our metrocluster (about 100m fibre distance between the nodes). The mirror disks hardly had any "action" while the primary disks were constanty working.

Changing to "alternate" helped especially my SATA disks that were working full time, essentialy doubling the number of read-spindles.

I suppose that the gain of changing to alternate would depend on the latency factor of the distance to the mirror disks.

I think you can change the setting on-the-fly and measure the effect on your system to get a clear picture if if you win or loose with this setting.

Performance advantages from options raid.mirror_read_plex_pref alternate

We are using it very carefully on FMC configs (though in a stretched MC it's almost always a win) because it can introduce very heavy performance impacts (or even panic the filer) if one of your backend fabrics has some connection problems (links going down briefly from time to time, for example).

Since we have a few customers with large distances between the sites (>50km) and some of them already had these kinds of problems, we're setting it to "local" by default on FMC and only switch to "alternate" if the fabrics/links turn out to be very stable (i.e. no port errors on the backend switches etc.)


Performance advantages from options raid.mirror_read_plex_pref alternate


Well, I enabled this about a week ago and have noticed some higher peaks for outbound data, but nothing really miraculous.  Like Michael says, there are (unnecessary) risks involved here if the remote pools get into trouble somehow.  We actually had an incident the following day where a disk failure basically turned the filer into a brick for about a minute while things tried to sort themselves.  This is probably only tangental to some other problems with disk maintenance and such, i.e. we provoked it faster with the change, but we have seen this on disk failure before.  Anyway, it's an on-going case and we'll see what the great minds can discover...

I have running graphs for our ISLs (mtrg still things 8Gbit/s is 532MB/s though) and we do spike quite a bit on reconstruction events.  I always wondered if we should implement TI on the SAN switches and separate disk and FC-VI traffic, but no one at NetApp has ever managed to give me a straight answer and implementing it on a running system (that has had far too many other problems already) doesn't seem to be anything that my local NetApp people seem to want to help with either.  We set this up mid-year 2009 when TI was still sort of a "cool thing to have" and the Metro-Cluster documentation was already such a mess that it was enough just to get things setup according to revised settings that I had to squeeze out engineering.

We've had problems like this before as well with remote plexes playing an inordinate role in the general health of the filers.  I/O requests from secondary consumers (in our case NFS and iSCSI) will just queue as long the situation with the remote plexes doesn't allow write completion and you end up with a lot of disk time-outs for your SAN hosts and things just really go south.  The path selection algorithm for the backend fabric seems to be much more primitive than most anything on the market except perhaps early VMWare and Windows.

Thanks everyone for the advice, in any case.  It is sadly disheartening to have to fight with the dark sides of MetroClusters... probably even more depressing to have to fight with NetApp support, but these are the conditions we are sort of forced to live with...