Subscribe

How do you track cluster performance?

Hi

How do you track cluster performance?

Can you please recommend handy statistics show/show-periodic inputs or other commands to track cluster performance.

I need to understand what disks and interfaces are overutilized and what hosts generate that load.

I mean like sysstat -x 2 in 7-mode but with more options. I know that I can use sysstat form node shell, but it not very handy to switch to each node to understand if there is 100% utilization or not.

There are a lot possibilities with "statistics show" and there is qos command that can return useful statistics. What commands do you use?

The question is what is the best command(s) to understand where is a bottleneck in a cluster and what generates that bottleneck.

Thank you.

Nick

Re: How do you track cluster performance?

"Statistics show-periodic" would be what you were looking for, I imagine. You just need to specify objects and counters you want to track.

Re: How do you track cluster performance?

Hi

Thank you.

statistics show-periodic has a lot of objects, and frankly I don't know what objects/fields/counters to specify to diagnose performance problems. Or how to specify different fields in one output, like disk_util+cpu util+iops+latency+net_out in one output.

There were sysstat in 7-mode with all output fields explained in documentation and KB. It is more complex with Cmode because of many nodes and more objects available.

So the question is the same: How do you diagnose performance in cmode? I need to answer a question if my cluster is overutilized, where and why.

Nick

Re: How do you track cluster performance?

In 7-mode, this is what sysstat looked like:

cm6080-rtp2::> node run local "sysstat -x 1"

CPU    NFS   CIFS   HTTP   Total     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache  Cache    CP  CP  Disk   OTHER    FCP  iSCSI     FCP   kB/s   iSCSI   kB/s

                                       in    out    read  write    read  write    age    hit  time  ty  util                            in    out      in    out

  2%      0      0      0       6       3      5      24      8       0      0   >60    100%    0%  -     5%       6      0      0       0      0       0      0

  2%      0      0      0       0       7      9       0      0       0      0   >60    100%    0%  -     0%       0      0      0       0      0       0      0

  2%      0      0      0       0       2      2       8     24       0      0   >60    100%    0%  -    10%       0      0      0       0      0       0      0

  2%      0      0      0      12       7     11      16      0       0      0   >60    100%    0%  -     5%      12      0      0       0      0       0      0

  2%      0      0      0       0       7      8       0      0       0      0   >60    100%    0%  -     0%       0      0      0       0      0       0      0

In cDOT, you can run a similar command across the cluster:

cm6080-rtp-01: cluster.cm6080-rtp2: 7/29/2013 07:53:20

  cpu    total                   data     data     data cluster  cluster  cluster     disk     disk

busy      ops  nfs-ops cifs-ops busy     recv     sent    busy     recv     sent     read    write

---- -------- -------- -------- ---- -------- -------- ------- -------- -------- -------- --------

   2%        0        0        0   0%   24.4KB     831B      0%   64.0KB   63.5KB    150KB    946KB

   3%        0        0        0   0%     900B     100B      0%   14.6KB   14.9KB    536KB   1.09MB

   3%        0        0        0   0%   7.29KB     450B      0%   17.1KB   16.4KB   7.92KB   3.96KB

cm6080-rtp-01: cluster.cm6080-rtp2: 7/29/2013 07:53:27

  cpu    total                   data     data     data cluster  cluster  cluster     disk     disk

busy      ops  nfs-ops cifs-ops busy     recv     sent    busy     recv     sent     read    write

---- -------- -------- -------- ---- -------- -------- ------- -------- -------- -------- --------

Minimums:

   2%        0        0        0   0%     900B     100B      0%   14.6KB   14.9KB   7.92KB   3.96KB

Averages for 3 samples:

   2%        0        0        0   0%   10.9KB     460B      0%   31.9KB   31.6KB    231KB    688KB

Maximums:

   3%        0        0        0   0%   24.4KB     831B      0%   64.0KB   63.5KB    536KB   1.09MB

You can also do this per node:

cm6080-rtp2::> statistics show-periodic -object node -instance cm6080-rtp-01

cm6080-rtp-01: node.cm6080-rtp-01: 7/29/2013 07:56:26

  cpu    total                   data     data     data cluster  cluster  cluster     disk     disk

busy      ops  nfs-ops cifs-ops busy     recv     sent    busy     recv     sent     read    write

---- -------- -------- -------- ---- -------- -------- ------- -------- -------- -------- --------

   7%       13        0       13   0%   50.6KB   61.3KB      0%   88.5KB   52.5KB   11.9KB   3.98KB

   6%        0        0        0   0%   5.92KB   4.92KB      0%   7.47KB   8.91KB   11.9KB   11.9KB

   6%        2        0        2   0%   9.20KB   1.33KB      0%   3.36KB   10.7KB   8.00KB   16.0KB

cm6080-rtp-01: node.cm6080-rtp-01: 7/29/2013 07:56:33

  cpu    total                   data     data     data cluster  cluster  cluster     disk     disk

busy      ops  nfs-ops cifs-ops busy     recv     sent    busy     recv     sent     read    write

---- -------- -------- -------- ---- -------- -------- ------- -------- -------- -------- --------

Minimums:

   6%        0        0        0   0%   5.92KB   1.33KB      0%   3.36KB   8.91KB   8.00KB   3.98KB

Averages for 3 samples:

   6%        5        0        5   0%   21.9KB   22.5KB      0%   33.1KB   24.0KB   10.6KB   10.6KB

Maximums:

   7%       13        0       13   0%   50.6KB   61.3KB      0%   88.5KB   52.5KB   11.9KB   16.0KB

You can also drill down into different counters, objects, etc: These are the available objects at admin level.

cm6080-rtp2::> statistics show-periodic -object

    aggregate                       audit_ng

    audit_ng:vserver                avoa

    avs                             cifs

    cifs:node                       cifs:vserver

    cluster                         cluster_peer

    cpx                             disk

    disk:raid_group                 ext_cache

    ext_cache_obj                   fcache

    fcp_lif                         fcp_lif:node

    fcp_lif:port                    fcp_lif:vserver

    hashd                           hostadapter

    ifnet                           iscsi_conn

    iscsi_conn:session              iscsi_lif

    iscsi_lif:node                  iscsi_lif:vserver

    lif                             lif:vserver

    logical_replication_destination logical_replication_source

    lun                             lun:constituent

    nblade_cifs                     nfsv3

    nfsv3:constituent               nfsv3:cpu

    nfsv3:node                      nfsv4

    nfsv4:constituent               nfsv4:cpu

    nfsv4:node                      nfsv4_1

    nfsv4_1:constituent             nfsv4_1:cpu

    nfsv4_1:node                    node

    path                            port

    processor                       processor:node

    qtree                           quota

    raid                            rquota

    smb1                            smb1:node

    smb1:vserver                    smb2

    smb2:node                       smb2:vserver

    smtape                          spinhi

    system                          target_port

    target_port:array               volume

    volume:node                     volume:vserver

    volume_move_summary             wafl_hya_per_aggr

    workload                        workload:constituent

    workload:policy_group           zapi

Counters depend on the object. Once you specify an object, you can find the counters using tab completion:

cm6080-rtp2::> statistics show-periodic -object cifs -instance cm6080-rtp2-01 -counter

    active_searches                  auth_reject_too_many

    change_notifications_outstanding cifs_latency

    cifs_latency_base                cifs_ops

    cifs_read_ops                    cifs_write_ops

    commands_outstanding             connected_shares

    connections                      established_sessions

    instance_name                    instance_uuid

    node_name                        node_uuid

    open_files                       process_name

    signed_sessions                  vserver_id

    vserver_name

Objects like CIFS and NFS can use the vserver as the object:

cm6080-rtp2::> statistics show-periodic -object cifs -instance vserver -counter cifs_latency

cm6080-rtp2: cifs.vserver: 7/29/2013 08:05:07

     cifs

  latency

--------

      0us

      0us

cm6080-rtp2: cifs.vserver: 7/29/2013 08:05:12

     cifs

  latency

--------

Minimums:

      0us

Averages for 2 samples:

      0us

Maximums:

      0us

There are plenty of possibilities for performance monitoring. Leverage tab completion to find different variances.

Also, use the man pages:

cm6080-rtp2::> man statistics show-periodic

statistics show-periodic        Data ONTAP 8.2        statistics show-periodic

NAME

     statistics show-periodic -- Continuously display current performance data at regular intervals

AVAILABILITY

     This command is available to cluster and Vserver administrators at the admin privilege level.

DESCRIPTION

     This command continuously displays specified performance data at regular intervals. The command output displays data in the following columns:

     o   cpu busy: Overall system utilization based on CPU utilization and subsystem utilization. Examples of subsystems include the storage subsystem and RAID subsystem.

     o   total ops: The number of total operations per second.

     o   nfs-ops: The number of NFS operations per second.

     o   cifs-ops: The number of CIFS operations per second.

     o   data busy: The percentage of time that data ports sent or received data.

     o   data recv: Network traffic received on data ports (KBps).

     o   data sent: Network traffic sent on data ports (KBps).

     o   cluster busy: The percentage of time that cluster ports sent or received data.

     o   cluster recv: Network traffic received on cluster ports (KBps).

     o   cluster sent: Network traffic sent on cluster ports (KBps).

     o   disk read: Data read from disk (KBps).

     o   disk write: Data written to disk (KBps).

PARAMETERS

     -object <text> - Object

         Selects the object for which you want to display performance data. The default object is "cluster".

     -instance <text> - Instance

         Selects the instance for which you want to display performance data. This parameter is required if you specify the -object parameter and enter any object other

         than "cluster".

         For example, if you want to display disk object statistics, you can use this parameter to specify the name of a specific disk whose statistics you want to view.

(cont)

Re: How do you track cluster performance?

Hi

Heh, you've got 6080, you probably don't have any problems with performance

Is it possible to print different objects in one output like disk util+iops+latency+cpu util?

Check this great TR. It explains how to track rogue workloads, etc.

http://www.netapp.com/us/media/tr-4211.pdf

Interesting is if you move to advanced mode statistics show-periodic shows more than in admin mode.

Nick

Re: How do you track cluster performance?

How do I get the counters for these object and more using SNMP in a Cluster Mode Filer?

For example to get the Consistency Point counters like cpFromCpDeferredOps, cpFromCpOps, cpFromFlushOps, etc we can get this information using SNMP using the OIB (1.3.6.1.4.1.789.1.2.6) as described here in 7Mode.

http://www.mibdepot.com/cgi-bin/getmib3.cgi?i=1&n=NETWORK-APPLIANCE-MIB&r=netapp&f=netapp_1_4.mib&v=v1&t=tree

How do I fetch the same information in a Clustered Mode filer?

Is there a MIB tree for a Cluster Mode Filer?    

Re: How do you track cluster performance?

I don't think there is currently a MIB for cDOT for performance counters. I believe the perf data portion is moving toward a ZAPI model, so there may not be a way to capture this via SNMP in the future.