Solved: Re: FAS2220 latency issues

bhenson · ‎2015-01-20

I'm currently deeling with a customer who has a FAS2220 HA with ONTAP 7-Mode 8.2.2. Unfortunately, it was installed incorrectly (he contacted my company to try to fix it): all data is stored in the root aggregate, needs to upgrade ONTAP, shares and exports need to be redone, snapshots (inefective because of being in aggr0) disabled, vifs need to be redone. The list goes on with all the problems. They are in the process of purchasing a new unit, so we're waiting to do most of the work until it arrives, then they going to move this unit to DR. Until then, he's worried about the delays in reads/writes the system is having now.

Cabling looks ok. It's run out to 2 Cisco 2960s for redundancy. Only issue with the switches is that they are showing certain ports flapping (our network guy has the information on that).

The ifgrps seem to really need some work (same settup for both controllers): vif1 - e0a, e0c. vif2 - e0b, e0d. e0M has been disabled (even though it's cabled). And vif1 and vif2 then seem to teamed up as svif. As in the two vifs are bonded as a seperate vif. This svif is what's doing all the sending/receiving/management. I have no idea how someone did this, but let me know if you've seem it before.

So the customer is wanting me to solve the latency issue. Is it worth it to recreate all the ifgrps in hopes that solves the problem? He can't afford to have unscheduled downtime (it's a 24-7 company). Should we just wait another couple weeks until his new equipment is in and we can just reset everything?

Looking for any advice/guidance you have. Thanks.

SeanLuce · ‎2015-01-23

Controller 2 only has 3 data drives. I am not sure what kind of workload you are tyring to run, but that is not very many. There are not enough IOPS to support the workload.

You have 2 spares on controller 2, so you could give up one of those spares to the aggregate and get a few more IOPS (this would require restriping your volumes). With systems this small, I usually do an "active/passive" configuration to provide a larger single pool of disks.

So instead of splitting the disks evenly, I would do the following:

Controller 1 (RAIDDP):

parity

dparity

data

spare

Controller 2 (RAID4):

parity

data

spare

Controller 1 gets all of the workload, and controller 2 is "passive" and will take over in case controller 1 fails.

View solution in original post

SeanLuce · ‎2015-01-21

A couple of things..

It is very common to NOT have a dedicated root aggregate on smaller systems. A dedicated root aggregate is not a requirement in 7-mode. Not having a dedicated root aggregate does not limit functionality in any way.

The multi-tier VIF/IFGRP that you see is also very common in smaller environments that do not have stacked switches.

For example:

Ports e0a and e0c are part of an LACP bond with both connections going to the same switch (vif1)

Ports e0b and e0d are part of an LACP bond with both connections going to the other switch (vif2)

vif1 and vif2 are then placed in to a active/passive (single mode) LIF called "svif".

So at any given time traffic is on once switch or the other.

Without stackable switches, this is the only way to provide link aggregation AND switch redundancy. The flapping could be the result of a misconfiguration on the switch or the NetApp. We would need to see the /etc/rc files from the controllers to determine exactly what is going on.

In regards to the performance..

What does 'sysstat -x 1' show? Are CPU or Disk Util % high? I would consider anything above 75% to be cause for concern or at least a place to start looking.

If the network is flapping, I suspect this also may have something to do with the performance issues.

bhenson · ‎2015-01-21

Thanks for the help. I'll post both controller's sysstat and /etc/rc. Controller Netapp1 doesn't seem to be having any problems, but disk utilization on Netapp2 is hitting 100%.

Controller 1 sysstat

NetApp1> sysstat -x 1
 CPU    NFS   CIFS   HTTP   Total     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache  Cache    CP  CP  Disk   OTHER    FCP  iSCSI     FCP   kB/s   iSCSI   kB/s
                                       in    out    read  write    read  write    age    hit  time  ty  util                            in    out      in    out
  1%      0      0      0       0       5      1       0      0       0      0    47    100%    0%  -     0%       0      0      0       0      0       0      0
  1%      0     11      0      11       6     21      52     24       0      0    47     95%    0%  -     3%       0      0      0       0      0       0      0
  2%      0    239      0     244      73    358     204      0       0      0    47     92%    0%  T0    3%       5      0      0       0      0       0      0
  1%      0     48      0      48      16     72     796   1412       0      0    47     99%   19%  :    15%       0      0      0       0      0       0      0
  1%      0     10      0      10       5      2      28     24       0      0    47    100%    0%  -     9%       0      0      0       0      0       0      0
  1%      0      5      0       5       5      2      20      0       0      0    47    100%    0%  -    27%       0      0      0       0      0       0      0
  1%      0     15      0      15       5      3       0      0       0      0    47      -     0%  -     0%       0      0      0       0      0       0      0
  1%      0     55      0      59      15      9       8     24       0      0    47     99%    0%  -     3%       4      0      0       0      0       0      0
  1%      0     20      0      20       6      3      16      8       0      0    47    100%    0%  -     2%       0      0      0       0      0       0      0
  1%      0     16      0      16       7      3       8      0       0      0    47    100%    0%  -     4%       0      0      0       0      0       0      0
  1%      0      0      0       0       4      0       8     24       0      0    47    100%    0%  -     3%       0      0      0       0      0       0      0
  1%      0     58      0      58      16      9      16      0       0      0    47    100%    0%  -    17%       0      0      0       0      0       0      0
  1%      0      2      0      17       4      1       0      0       0      0    47    100%    0%  -     0%      15      0      0       0      0       0      0
  1%      0      0      0       0       7      0     780   1420       0      0    47    100%   18%  T    20%       0      0      0       0      0       0      0
  1%      0     37      0      37      11      6      28      0       0      0    47     99%    0%  -     9%       0      0      0       0      0       0      0
  1%      0      6      0       6       3      1       0      0       0      0    47    100%    0%  -     0%       0      0      0       0      0       0      0
  1%      0      4      0       4       7      1       8     24       0      0    47    100%    0%  -     3%       0      0      0       0      0       0      0
  2%      0      0      0       5       8      1      16      0       0      0    47    100%    0%  -    25%       5      0      0       0      0       0      0
  1%      0      0      0       0       5      0       0      8       0      0    47    100%    0%  -     1%       0      0      0       0      0       0      0
 CPU    NFS   CIFS   HTTP   Total     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache  Cache    CP  CP  Disk   OTHER    FCP  iSCSI     FCP   kB/s   iSCSI   kB/s
                                       in    out    read  write    read  write    age    hit  time  ty  util                            in    out      in    out
  1%      0      0      0       0       4      0      16     24       0      0    47     88%    0%  -     3%       0      0      0       0      0       0      0
  1%      0      0      0       0       4      1      16      0       0      0    47    100%    0%  -     4%       0      0      0       0      0       0      0

Controller 2 sysstat

NetApp2> sysstat -x 1
 CPU    NFS   CIFS   HTTP   Total     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache  Cache    CP  CP  Disk   OTHER    FCP  iSCSI     FCP   kB/s   iSCSI   kB/s
                                       in    out    read  write    read  write    age    hit  time  ty  util                            in    out      in    out
 14%      0    214      0     214   14156    485    1912 105296       0      0     3s    93%  100%  :f  100%       0      0      0       0      0       0      0
 26%      0   1169      0    1172   77341   1980    4668  24440       0      0     3s    73%   39%  Hf   68%       3      0      0       0      0       0      0
 11%      0    179      0     183   11923    763    3248  86436       0      0     3s    93%  100%  :f  100%       4      0      0       0      0       0      0
 17%      0    990      0     990   64805   1409    3924   7792       0      0     3s    53%   29%  :    74%       0      0      0       0      0       0      0
 17%      0    382      0     382   24533   1431    4000 107284       0      0     3s    79%   92%  Hf   95%       0      0      0       0      0       0      0
 25%      0   1172      0    1172   76275   2525    5720  12864       0      0     3s    82%   44%  Hn   74%       0      0      0       0      0       0      0
 14%      0    164      0     164   10882    536    2572 104812       0      0     3s    94%  100%  :f  100%       0      0      0       0      0       0      0
 12%      0    614      0     618   40395   1350    2420  11180       0      0     3s    77%   40%  :    63%       4      0      0       0      0       0      0
 22%      0    662      0     662   43514   1198    3456  35056       0      0     3s    90%   32%  Hf   50%       0      0      0       0      0       0      0
  7%      0     33      0      33    2191     51    1404  82424       0      0     3s    98%   93%  :    89%       0      0      0       0      0       0      0
  1%      0      0      0       0       6      2       0      0       0      0     3s   100%    0%  -     0%       0      0      0       0      0       0      0
  1%      0      0      0       0       4      0       0      0       0      0     3s   100%    0%  -     0%       0      0      0       0      0       0      0
  1%      0      0      0       4       3     30      16     32       0      0     3s    98%    0%  -     5%       4      0      0       0      0       0      0
  5%      0    252      0     252   16464    817     928      0       0      0     3s    71%    0%  -    16%       0      0      0       0      0       0      0
 26%      0   1088      0    1088   71821   1937    5300  42096       0      0     3s    78%   42%  Hf   70%       0      0      0       0      0       0      0
  8%      0    263      0     263   17460   1119    2236  69128       0      0     3s    51%  100%  :f   99%       0      0      0       0      0       0      0
 25%      0   1049      0    1049   68698   2569    4556  18928       0      0     9s    76%   35%  Hn   70%       0      0      0       0      0       0      0
 13%      0    147      0     152    9746   1132    2436  91668       0      0     9s    94%  100%  :f  100%       5      0      0       0      0       0      0
 14%      0    784      0     784   52038   1880    3056  10964       0      0     9s    64%   36%  :    73%       0      0      0       0      0       0      0
 CPU    NFS   CIFS   HTTP   Total     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache  Cache    CP  CP  Disk   OTHER    FCP  iSCSI     FCP   kB/s   iSCSI   kB/s
                                       in    out    read  write    read  write    age    hit  time  ty  util                            in    out      in    out
 22%      0    497      0     497   32938   1624    4052  74340       0      0     9s    95%   79%  Hf   96%       0      0      0       0      0       0      0
 10%      0    595      0     595   28364   1648    2788  42392       0      0     9s    59%   71%  :   100%       0      0      0       0      0       0      0
 17%      0    900      0     900   50004   2389    3956  31589       0      0     9s    77%   26%  Hn   84%       0      0      0       0      0       0      0

Controller 1 /etc/rc

NetApp1> rdfile /etc/rc
#Auto-generated by setup Tue Jan 28 09:31:14 EST 2014
hostname RJ-NetApp1
ifgrp create multi vif1 -b ip e0a e0c
ifgrp create multi vif2 -b ip e0b e0d
ifgrp create single svif vif1 vif2
ifconfig svif `hostname`-svif mediatype auto partner svif mtusize 1500
route add default 192.168.1.1 1
routed on
options dns.domainname ronjon.corp
options dns.enable on
options nis.enable off
savecore

Controller 2 /etc/rc

NetApp2> rdfile /etc/rc
#Auto-generated by setup Tue Jan 28 09:33:21 EST 2014
hostname RJ-NetApp2
ifgrp create multi vif1 -b ip e0a e0c
ifgrp create multi vif2 -b ip e0b e0d
ifgrp create single svif vif1 vif2
ifconfig svif `hostname`-svif mediatype auto partner svif mtusize 1500
route add default 192.168.1.1 1
routed on
options dns.domainname ronjon.corp
options dns.enable on
options nis.enable off
savecore

SeanLuce · ‎2015-01-21

Disk utilization at 100% is the issue.

aggr status -r (from both controllers) will show us how the raid groups are configured. From there we can see why you are having disk util issues.

bhenson · ‎2015-01-21

aggr status

Controller 1

NetApp1> aggr status -r
Aggregate aggr0 (online, raid4) (block checksums)
  Plex /aggr0/plex0 (online, normal, active)
    RAID group /aggr0/plex0/rg0 (normal, block checksums)

      RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------          ------------- ---- ---- ---- ----- --------------    --------------
      parity    0a.00.2         0a    0   2   SA:A   0  BSAS  7200 2538546/5198943744 2543634/5209362816
      data      0a.00.4         0a    0   4   SA:A   0  BSAS  7200 2538546/5198943744 2543634/5209362816
      data      0a.00.6         0a    0   6   SA:A   0  BSAS  7200 2538546/5198943744 2543634/5209362816
      data      0a.00.8         0a    0   8   SA:A   0  BSAS  7200 2538546/5198943744 2543634/5209362816
      data      0a.00.10        0a    0   10  SA:A   0  BSAS  7200 2538546/5198943744 2543634/5209362816


Pool1 spare disks (empty)

Pool0 spare disks

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------          ------------- ---- ---- ---- ----- --------------    --------------
Spare disks for block checksum
spare           0a.00.0         0a    0   0   SA:A   0  BSAS  7200 2538546/5198943744 2543634/5209362816

Partner disks

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------          ------------- ---- ---- ---- ----- --------------    --------------
partner         0b.00.11        0b    0   11  SA:B   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.5         0b    0   5   SA:B   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.3         0b    0   3   SA:B   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.9         0b    0   9   SA:B   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.7         0b    0   7   SA:B   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.1         0b    0   1   SA:B   0  BSAS  7200 0/0               2543634/5209362816

Controller 2

NetApp2> aggr status -r
Aggregate aggr0 (online, raid4) (block checksums)
  Plex /aggr0/plex0 (online, normal, active)
    RAID group /aggr0/plex0/rg0 (normal, block checksums)

      RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------          ------------- ---- ---- ---- ----- --------------    --------------
      parity    0a.00.3         0a    0   3   SA:B   0  BSAS  7200 2538546/5198943744 2543634/5209362816
      data      0a.00.5         0a    0   5   SA:B   0  BSAS  7200 2538546/5198943744 2543634/5209362816
      data      0a.00.9         0a    0   9   SA:B   0  BSAS  7200 2538546/5198943744 2543634/5209362816
      data      0a.00.7         0a    0   7   SA:B   0  BSAS  7200 2538546/5198943744 2543634/5209362816


Pool1 spare disks (empty)

Pool0 spare disks

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------          ------------- ---- ---- ---- ----- --------------    --------------
Spare disks for block checksum
spare           0a.00.1         0a    0   1   SA:B   0  BSAS  7200 2538546/5198943744 2543634/5209362816
spare           0a.00.11        0a    0   11  SA:B   0  BSAS  7200 2538546/5198943744 2543634/5209362816

Partner disks

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------          ------------- ---- ---- ---- ----- --------------    --------------
partner         0b.00.10        0b    0   10  SA:A   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.2         0b    0   2   SA:A   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.6         0b    0   6   SA:A   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.4         0b    0   4   SA:A   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.8         0b    0   8   SA:A   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.0         0b    0   0   SA:A   0  BSAS  7200 0/0               2543634/5209362816

SeanLuce · ‎2015-01-23

Controller 2 only has 3 data drives. I am not sure what kind of workload you are tyring to run, but that is not very many. There are not enough IOPS to support the workload.

You have 2 spares on controller 2, so you could give up one of those spares to the aggregate and get a few more IOPS (this would require restriping your volumes). With systems this small, I usually do an "active/passive" configuration to provide a larger single pool of disks.

So instead of splitting the disks evenly, I would do the following:

Controller 1 (RAIDDP):

parity

dparity

data

spare

Controller 2 (RAID4):

parity

data

spare

Controller 1 gets all of the workload, and controller 2 is "passive" and will take over in case controller 1 fails.

SeanLuce · ‎2015-01-23

The /etc/rc files look good. I have implemented this same configuration several times.

You just have to make sure the switch is configured for static etherchannel (NetApp's "multi'), and not dynamic etherchannel (NetApp's "lacp").

bhenson · ‎2015-01-23

Makes sense. Thanks a lot for the help.

bhenson · ‎2015-01-21

Here's the aggr status:

Controller 1

NetApp1> aggr status -r
Aggregate aggr0 (online, raid4) (block checksums)
  Plex /aggr0/plex0 (online, normal, active)
    RAID group /aggr0/plex0/rg0 (normal, block checksums)

      RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------          ------------- ---- ---- ---- ----- --------------    --------------
      parity    0a.00.2         0a    0   2   SA:A   0  BSAS  7200 2538546/5198943744 2543634/5209362816
      data      0a.00.4         0a    0   4   SA:A   0  BSAS  7200 2538546/5198943744 2543634/5209362816
      data      0a.00.6         0a    0   6   SA:A   0  BSAS  7200 2538546/5198943744 2543634/5209362816
      data      0a.00.8         0a    0   8   SA:A   0  BSAS  7200 2538546/5198943744 2543634/5209362816
      data      0a.00.10        0a    0   10  SA:A   0  BSAS  7200 2538546/5198943744 2543634/5209362816


Pool1 spare disks (empty)

Pool0 spare disks

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------          ------------- ---- ---- ---- ----- --------------    --------------
Spare disks for block checksum
spare           0a.00.0         0a    0   0   SA:A   0  BSAS  7200 2538546/5198943744 2543634/5209362816

Partner disks

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------          ------------- ---- ---- ---- ----- --------------    --------------
partner         0b.00.11        0b    0   11  SA:B   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.5         0b    0   5   SA:B   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.3         0b    0   3   SA:B   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.9         0b    0   9   SA:B   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.7         0b    0   7   SA:B   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.1         0b    0   1   SA:B   0  BSAS  7200 0/0               2543634/5209362816

Controller 2

NetApp2> aggr status -r
Aggregate aggr0 (online, raid4) (block checksums)
  Plex /aggr0/plex0 (online, normal, active)
    RAID group /aggr0/plex0/rg0 (normal, block checksums)

      RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------          ------------- ---- ---- ---- ----- --------------    --------------
      parity    0a.00.3         0a    0   3   SA:B   0  BSAS  7200 2538546/5198943744 2543634/5209362816
      data      0a.00.5         0a    0   5   SA:B   0  BSAS  7200 2538546/5198943744 2543634/5209362816
      data      0a.00.9         0a    0   9   SA:B   0  BSAS  7200 2538546/5198943744 2543634/5209362816
      data      0a.00.7         0a    0   7   SA:B   0  BSAS  7200 2538546/5198943744 2543634/5209362816


Pool1 spare disks (empty)

Pool0 spare disks

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------          ------------- ---- ---- ---- ----- --------------    --------------
Spare disks for block checksum
spare           0a.00.1         0a    0   1   SA:B   0  BSAS  7200 2538546/5198943744 2543634/5209362816
spare           0a.00.11        0a    0   11  SA:B   0  BSAS  7200 2538546/5198943744 2543634/5209362816

Partner disks

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------          ------------- ---- ---- ---- ----- --------------    --------------
partner         0b.00.10        0b    0   10  SA:A   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.2         0b    0   2   SA:A   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.6         0b    0   6   SA:A   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.4         0b    0   4   SA:A   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.8         0b    0   8   SA:A   0  BSAS  7200 0/0               2543634/5209362816
partner         0b.00.0         0b    0   0   SA:A   0  BSAS  7200 0/0               2543634/5209362816