Re: How do you track cluster performance?

nicholas4704 · ‎2013-07-24

Hi

How do you track cluster performance?

Can you please recommend handy statistics show/show-periodic inputs or other commands to track cluster performance.

I need to understand what disks and interfaces are overutilized and what hosts generate that load.

I mean like sysstat -x 2 in 7-mode but with more options. I know that I can use sysstat form node shell, but it not very handy to switch to each node to understand if there is 100% utilization or not.

There are a lot possibilities with "statistics show" and there is qos command that can return useful statistics. What commands do you use?

The question is what is the best command(s) to understand where is a bottleneck in a cluster and what generates that bottleneck.

Thank you.

Nick

parisi · ‎2013-07-25

"Statistics show-periodic" would be what you were looking for, I imagine. You just need to specify objects and counters you want to track.

nicholas4704 · ‎2013-07-29

Hi

Thank you.

statistics show-periodic has a lot of objects, and frankly I don't know what objects/fields/counters to specify to diagnose performance problems. Or how to specify different fields in one output, like disk_util+cpu util+iops+latency+net_out in one output.

There were sysstat in 7-mode with all output fields explained in documentation and KB. It is more complex with Cmode because of many nodes and more objects available.

So the question is the same: How do you diagnose performance in cmode? I need to answer a question if my cluster is overutilized, where and why.

Nick

parisi · ‎2013-07-29

In 7-mode, this is what sysstat looked like:

cm6080-rtp2::> node run local "sysstat -x 1"

CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk OTHER FCP iSCSI FCP kB/s iSCSI kB/s

in out read write read write age hit time ty util in out in out

2% 0 0 0 6 3 5 24 8 0 0 >60 100% 0% - 5% 6 0 0 0 0 0 0

2% 0 0 0 0 7 9 0 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0 0

2% 0 0 0 0 2 2 8 24 0 0 >60 100% 0% - 10% 0 0 0 0 0 0 0

2% 0 0 0 12 7 11 16 0 0 0 >60 100% 0% - 5% 12 0 0 0 0 0 0

2% 0 0 0 0 7 8 0 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0 0

In cDOT, you can run a similar command across the cluster:

cm6080-rtp-01: cluster.cm6080-rtp2: 7/29/2013 07:53:20

cpu total data data data cluster cluster cluster disk disk

busy ops nfs-ops cifs-ops busy recv sent busy recv sent read write

---- -------- -------- -------- ---- -------- -------- ------- -------- -------- -------- --------

2% 0 0 0 0% 24.4KB 831B 0% 64.0KB 63.5KB 150KB 946KB

3% 0 0 0 0% 900B 100B 0% 14.6KB 14.9KB 536KB 1.09MB

3% 0 0 0 0% 7.29KB 450B 0% 17.1KB 16.4KB 7.92KB 3.96KB

cm6080-rtp-01: cluster.cm6080-rtp2: 7/29/2013 07:53:27

cpu total data data data cluster cluster cluster disk disk

busy ops nfs-ops cifs-ops busy recv sent busy recv sent read write

---- -------- -------- -------- ---- -------- -------- ------- -------- -------- -------- --------

Minimums:

2% 0 0 0 0% 900B 100B 0% 14.6KB 14.9KB 7.92KB 3.96KB

Averages for 3 samples:

2% 0 0 0 0% 10.9KB 460B 0% 31.9KB 31.6KB 231KB 688KB

Maximums:

3% 0 0 0 0% 24.4KB 831B 0% 64.0KB 63.5KB 536KB 1.09MB

You can also do this per node:

cm6080-rtp2::> statistics show-periodic -object node -instance cm6080-rtp-01

cm6080-rtp-01: node.cm6080-rtp-01: 7/29/2013 07:56:26

cpu total data data data cluster cluster cluster disk disk

busy ops nfs-ops cifs-ops busy recv sent busy recv sent read write

---- -------- -------- -------- ---- -------- -------- ------- -------- -------- -------- --------

7% 13 0 13 0% 50.6KB 61.3KB 0% 88.5KB 52.5KB 11.9KB 3.98KB

6% 0 0 0 0% 5.92KB 4.92KB 0% 7.47KB 8.91KB 11.9KB 11.9KB

6% 2 0 2 0% 9.20KB 1.33KB 0% 3.36KB 10.7KB 8.00KB 16.0KB

cm6080-rtp-01: node.cm6080-rtp-01: 7/29/2013 07:56:33

cpu total data data data cluster cluster cluster disk disk

busy ops nfs-ops cifs-ops busy recv sent busy recv sent read write

---- -------- -------- -------- ---- -------- -------- ------- -------- -------- -------- --------

Minimums:

6% 0 0 0 0% 5.92KB 1.33KB 0% 3.36KB 8.91KB 8.00KB 3.98KB

Averages for 3 samples:

6% 5 0 5 0% 21.9KB 22.5KB 0% 33.1KB 24.0KB 10.6KB 10.6KB

Maximums:

7% 13 0 13 0% 50.6KB 61.3KB 0% 88.5KB 52.5KB 11.9KB 16.0KB

You can also drill down into different counters, objects, etc: These are the available objects at admin level.

cm6080-rtp2::> statistics show-periodic -object

aggregate audit_ng

audit_ng:vserver avoa

avs cifs

cifs:node cifs:vserver

cluster cluster_peer

cpx disk

disk:raid_group ext_cache

ext_cache_obj fcache

fcp_lif fcp_lif:node

fcp_lif:port fcp_lif:vserver

hashd hostadapter

ifnet iscsi_conn

iscsi_conn:session iscsi_lif

iscsi_lif:node iscsi_lif:vserver

lif lif:vserver

logical_replication_destination logical_replication_source

lun lun:constituent

nblade_cifs nfsv3

nfsv3:constituent nfsv3:cpu

nfsv3:node nfsv4

nfsv4:constituent nfsv4:cpu

nfsv4:node nfsv4_1

nfsv4_1:constituent nfsv4_1:cpu

nfsv4_1:node node

path port

processor processor:node

qtree quota

raid rquota

smb1 smb1:node

smb1:vserver smb2

smb2:node smb2:vserver

smtape spinhi

system target_port

target_port:array volume

volume:node volume:vserver

volume_move_summary wafl_hya_per_aggr

workload workload:constituent

workload:policy_group zapi

Counters depend on the object. Once you specify an object, you can find the counters using tab completion:

cm6080-rtp2::> statistics show-periodic -object cifs -instance cm6080-rtp2-01 -counter

active_searches auth_reject_too_many

change_notifications_outstanding cifs_latency

cifs_latency_base cifs_ops

cifs_read_ops cifs_write_ops

commands_outstanding connected_shares

connections established_sessions

instance_name instance_uuid

node_name node_uuid

open_files process_name

signed_sessions vserver_id

vserver_name

Objects like CIFS and NFS can use the vserver as the object:

cm6080-rtp2::> statistics show-periodic -object cifs -instance vserver -counter cifs_latency

cm6080-rtp2: cifs.vserver: 7/29/2013 08:05:07

cifs

latency

--------

0us

cm6080-rtp2: cifs.vserver: 7/29/2013 08:05:12

cifs

latency

--------

Minimums:

0us

Averages for 2 samples:

0us

Maximums:

0us

There are plenty of possibilities for performance monitoring. Leverage tab completion to find different variances.

Also, use the man pages:

cm6080-rtp2::> man statistics show-periodic

statistics show-periodic Data ONTAP 8.2 statistics show-periodic

NAME

statistics show-periodic -- Continuously display current performance data at regular intervals

AVAILABILITY

This command is available to cluster and Vserver administrators at the admin privilege level.

DESCRIPTION

This command continuously displays specified performance data at regular intervals. The command output displays data in the following columns:

o cpu busy: Overall system utilization based on CPU utilization and subsystem utilization. Examples of subsystems include the storage subsystem and RAID subsystem.

o total ops: The number of total operations per second.

o nfs-ops: The number of NFS operations per second.

o cifs-ops: The number of CIFS operations per second.

o data busy: The percentage of time that data ports sent or received data.

o data recv: Network traffic received on data ports (KBps).

o data sent: Network traffic sent on data ports (KBps).

o cluster busy: The percentage of time that cluster ports sent or received data.

o cluster recv: Network traffic received on cluster ports (KBps).

o cluster sent: Network traffic sent on cluster ports (KBps).

o disk read: Data read from disk (KBps).

o disk write: Data written to disk (KBps).

PARAMETERS

-object <text> - Object

Selects the object for which you want to display performance data. The default object is "cluster".

-instance <text> - Instance

Selects the instance for which you want to display performance data. This parameter is required if you specify the -object parameter and enter any object other

than "cluster".

For example, if you want to display disk object statistics, you can use this parameter to specify the name of a specific disk whose statistics you want to view.

(cont)

nicholas4704 · ‎2013-08-20

Hi

Heh, you've got 6080, you probably don't have any problems with performance

Is it possible to print different objects in one output like disk util+iops+latency+cpu util?

Check this great TR. It explains how to track rogue workloads, etc.

http://www.netapp.com/us/media/tr-4211.pdf

Interesting is if you move to advanced mode statistics show-periodic shows more than in admin mode.

Nick

VIVEKNANDAVANAM · ‎2014-07-02

How do I get the counters for these object and more using SNMP in a Cluster Mode Filer?

For example to get the Consistency Point counters like cpFromCpDeferredOps, cpFromCpOps, cpFromFlushOps, etc we can get this information using SNMP using the OIB (1.3.6.1.4.1.789.1.2.6) as described here in 7Mode.

http://www.mibdepot.com/cgi-bin/getmib3.cgi?i=1&n=NETWORK-APPLIANCE-MIB&r=netapp&f=netapp_1_4.mib&v=v1&t=tree

How do I fetch the same information in a Clustered Mode filer?

Is there a MIB tree for a Cluster Mode Filer?

parisi · ‎2014-07-17

I don't think there is currently a MIB for cDOT for performance counters. I believe the perf data portion is moving toward a ZAPI model, so there may not be a way to capture this via SNMP in the future.