I'm currently deeling with a customer who has a FAS2220 HA with ONTAP 7-Mode 8.2.2. Unfortunately, it was installed incorrectly (he contacted my company to try to fix it): all data is stored in the root aggregate, needs to upgrade ONTAP, shares and exports need to be redone, snapshots (inefective because of being in aggr0) disabled, vifs need to be redone. The list goes on with all the problems. They are in the process of purchasing a new unit, so we're waiting to do most of the work until it arrives, then they going to move this unit to DR. Until then, he's worried about the delays in reads/writes the system is having now.
Cabling looks ok. It's run out to 2 Cisco 2960s for redundancy. Only issue with the switches is that they are showing certain ports flapping (our network guy has the information on that).
The ifgrps seem to really need some work (same settup for both controllers): vif1 - e0a, e0c. vif2 - e0b, e0d. e0M has been disabled (even though it's cabled). And vif1 and vif2 then seem to teamed up as svif. As in the two vifs are bonded as a seperate vif. This svif is what's doing all the sending/receiving/management. I have no idea how someone did this, but let me know if you've seem it before.
So the customer is wanting me to solve the latency issue. Is it worth it to recreate all the ifgrps in hopes that solves the problem? He can't afford to have unscheduled downtime (it's a 24-7 company). Should we just wait another couple weeks until his new equipment is in and we can just reset everything?
It is very common to NOT have a dedicated root aggregate on smaller systems. A dedicated root aggregate is not a requirement in 7-mode. Not having a dedicated root aggregate does not limit functionality in any way.
The multi-tier VIF/IFGRP that you see is also very common in smaller environments that do not have stacked switches.
Ports e0a and e0c are part of an LACP bond with both connections going to the same switch (vif1)
Ports e0b and e0d are part of an LACP bond with both connections going to the other switch (vif2)
vif1 and vif2 are then placed in to a active/passive (single mode) LIF called "svif".
So at any given time traffic is on once switch or the other.
Without stackable switches, this is the only way to provide link aggregation AND switch redundancy. The flapping could be the result of a misconfiguration on the switch or the NetApp. We would need to see the /etc/rc files from the controllers to determine exactly what is going on.
In regards to the performance..
What does 'sysstat -x 1' show? Are CPU or Disk Util % high? I would consider anything above 75% to be cause for concern or at least a place to start looking.
If the network is flapping, I suspect this also may have something to do with the performance issues.
Controller 2 only has 3 data drives. I am not sure what kind of workload you are tyring to run, but that is not very many. There are not enough IOPS to support the workload.
You have 2 spares on controller 2, so you could give up one of those spares to the aggregate and get a few more IOPS (this would require restriping your volumes). With systems this small, I usually do an "active/passive" configuration to provide a larger single pool of disks.
So instead of splitting the disks evenly, I would do the following:
Controller 1 (RAIDDP):
Controller 2 (RAID4):
Controller 1 gets all of the workload, and controller 2 is "passive" and will take over in case controller 1 fails.