Re: CDOT volume mapping to aggr raid groups

FRANK_KEOUGH · ‎2015-04-26

We have had some performance problems with a fairly new 8040 install, we think most of the issue is because it was fairly unbalanced in terms of workloads,

we are only running FC to esx5.5. We have been moving workloads from 1 7.2k aggr to another 7.2k aggr on the 2nd controller.

During this time, working with support we think we had a hot drive, which was administratively failed. Looking at disk performance utilization in that aggregate

we see a lot of disk running at 100%, guessing that is because of the ongoing rebuild. We have been testing out various monitoring tools, OnCommand Balance is one,

It has flagged an Exchange server as having high IOPS along with some other volumes.

I have looked at the volume show commands and fields and wondering if there is a way to see what volumes are mapped to raid groups...

Thanks.

bobshouseofcards · ‎2015-04-26

Hi Frank -

Your question raises some interesting points about how NetApp FAS storage maps logical to physical space.

FlexVol volumes (the only type available under CDot) are not mapped to RAID groups within an aggregate at all. The volume is free to use space on any and all aggregate raid groups (assuming that you have more than one) and will automatically use free space throughout the entire physical aggregate.

This does mean, however, that you want to create "full" aggregates up front if possible. For example, if you create an aggregate that has two raid groups, fill it to the point where it needs to be expanded, then expand with two more raid groups worth of disk, you still have "hot" disks in that the original raid groups contain all the data and only as data is changed does it slowly move to the new space over time. There are reallocation settings and processes you can adjust to force a rebalance of the aggregate, but it isn't free and it isn't fast as it runs as a background process. Hence it is important, to the degree possible, to model both capacity and IOPs as best as possible up front to prepare an aggregate that will take that load up front (such as you can of course - not always easy).

The same holds true for adding raid groups to an aggregate - best to always add a full raid group at a time (half a group if necessary, but never odd amounts with just a couple disks at a time) to avoid creating hot spots artifically. Even so, adding disk for IOPs capabilty on a read heavy aggregate, for instance, doesn't help until the aggregate is manually reallocated (rebalanced). Remember in WAFL that data is moved to new space only as it is written. Then also, adding space on an essentially full aggregate means that all the newest writes tend to get put on the newest space, which could create yet another artificial hotspot if that data is also heavily written. It's always an ongoing effort to maintain physical layout balance over time.

You mention working with NetApp support - interesting that they would suggest administratively failing a potentially hot disk. After all, that doesn't change the workload - once the disk was reconstructed the new disk would run just as hot. And the other disks would run high during the rebuild. Moving workloads, reallocating space throughout the aggregate, and/or adding IOPs capability (more disks) singly or in combination are the only mechanisms ultimately that reduce disk workload.

Granted I have seen bad aggregate setups that by design create hot disks and limit IO - for instance I once worked with a system that had 4 raid groups in an aggregate but because of how many disks the customer bought they only had 4 disks left over for the last raid group - they just wanted to "use them all". That raid roup, unsurprisingly, was always running hot and limited the total throughput of the entire aggregate. Remember that volumes are spread everywhere, so every volume they had likely touched the 2 data disks in the small raid group and being "SATA" type disks pulled down total system performance. A perfstat clearly showed it too - that raid group's disks where constantly near 100% utilization where all other disks were more like 35-45%.

Since you are working with support I assume you've sent in perfstats. If they haven't sent you back the relevant sections on disk I/O, you should ask for them or take the time to extract them from the archive yourself. Sorry, but off the top of my head I don't remember the section name. But the complete disk performance data is in there and it will clearly show if you are just driving the disks too hard over the perfstat collection interval as well as show why - Commit Points, Reads, Writes, everything. It's actually very helpful to see it to help understand how everything is moving through the system.

You can get statistics by disk and by raid group from the CLI, but it's harder to get the complete summary. The same data is better organized in the perfstat collection. But, to look at individual raid groups, for example, you can first list the raid groups on your system:

::> statistics catalog instance show -object disk:raid_group

Then display the performance data over time for any raid groups that might be of interest:

::> statistics show-periodic -object disk:raid_group -instance <raid-group-name>

This will display a lot so I suggest capturing it into a log so you can spread the data out - but it lets you check for hot raid groups individually as you suspect they are happening. OnCommand Performance Manager can collect a lot of this statisical data as well and does further analysis to break it down into what volumes are causing the I/O or suffering from the I/O at the time of various events. I'm still working with the latest OPM myself to really understand the usefullness of it, but at the storage level it really helps to deliver underlying information, and of course it's a freebie. It does not do so well at mapping a volume's I/O load directly to the VM vausing the I/O as tools like Balance or Insight do. But with OPM you can at least get total I/O for volumes and then use that information on a gross level to help inform your choices of what to move where.

I hope this helps you understand more about your performance issues and options available to determine the best courses of action. I welcome any response or questions you have.

Bob Greenwald

NCIE-SAN, Clustered Data OnTAP