ONTAP Discussions
ONTAP Discussions
Hi
How do you track cluster performance?
Can you please recommend handy statistics show/show-periodic inputs or other commands to track cluster performance.
I need to understand what disks and interfaces are overutilized and what hosts generate that load.
I mean like sysstat -x 2 in 7-mode but with more options. I know that I can use sysstat form node shell, but it not very handy to switch to each node to understand if there is 100% utilization or not.
There are a lot possibilities with "statistics show" and there is qos command that can return useful statistics. What commands do you use?
The question is what is the best command(s) to understand where is a bottleneck in a cluster and what generates that bottleneck.
Thank you.
Nick
"Statistics show-periodic" would be what you were looking for, I imagine. You just need to specify objects and counters you want to track.
Hi
Thank you.
statistics show-periodic has a lot of objects, and frankly I don't know what objects/fields/counters to specify to diagnose performance problems. Or how to specify different fields in one output, like disk_util+cpu util+iops+latency+net_out in one output.
There were sysstat in 7-mode with all output fields explained in documentation and KB. It is more complex with Cmode because of many nodes and more objects available.
So the question is the same: How do you diagnose performance in cmode? I need to answer a question if my cluster is overutilized, where and why.
Nick
In 7-mode, this is what sysstat looked like:
cm6080-rtp2::> node run local "sysstat -x 1"
CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk OTHER FCP iSCSI FCP kB/s iSCSI kB/s
in out read write read write age hit time ty util in out in out
2% 0 0 0 6 3 5 24 8 0 0 >60 100% 0% - 5% 6 0 0 0 0 0 0
2% 0 0 0 0 7 9 0 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0 0
2% 0 0 0 0 2 2 8 24 0 0 >60 100% 0% - 10% 0 0 0 0 0 0 0
2% 0 0 0 12 7 11 16 0 0 0 >60 100% 0% - 5% 12 0 0 0 0 0 0
2% 0 0 0 0 7 8 0 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0 0
In cDOT, you can run a similar command across the cluster:
cm6080-rtp-01: cluster.cm6080-rtp2: 7/29/2013 07:53:20
cpu total data data data cluster cluster cluster disk disk
busy ops nfs-ops cifs-ops busy recv sent busy recv sent read write
---- -------- -------- -------- ---- -------- -------- ------- -------- -------- -------- --------
2% 0 0 0 0% 24.4KB 831B 0% 64.0KB 63.5KB 150KB 946KB
3% 0 0 0 0% 900B 100B 0% 14.6KB 14.9KB 536KB 1.09MB
3% 0 0 0 0% 7.29KB 450B 0% 17.1KB 16.4KB 7.92KB 3.96KB
cm6080-rtp-01: cluster.cm6080-rtp2: 7/29/2013 07:53:27
cpu total data data data cluster cluster cluster disk disk
busy ops nfs-ops cifs-ops busy recv sent busy recv sent read write
---- -------- -------- -------- ---- -------- -------- ------- -------- -------- -------- --------
Minimums:
2% 0 0 0 0% 900B 100B 0% 14.6KB 14.9KB 7.92KB 3.96KB
Averages for 3 samples:
2% 0 0 0 0% 10.9KB 460B 0% 31.9KB 31.6KB 231KB 688KB
Maximums:
3% 0 0 0 0% 24.4KB 831B 0% 64.0KB 63.5KB 536KB 1.09MB
You can also do this per node:
cm6080-rtp2::> statistics show-periodic -object node -instance cm6080-rtp-01
cm6080-rtp-01: node.cm6080-rtp-01: 7/29/2013 07:56:26
cpu total data data data cluster cluster cluster disk disk
busy ops nfs-ops cifs-ops busy recv sent busy recv sent read write
---- -------- -------- -------- ---- -------- -------- ------- -------- -------- -------- --------
7% 13 0 13 0% 50.6KB 61.3KB 0% 88.5KB 52.5KB 11.9KB 3.98KB
6% 0 0 0 0% 5.92KB 4.92KB 0% 7.47KB 8.91KB 11.9KB 11.9KB
6% 2 0 2 0% 9.20KB 1.33KB 0% 3.36KB 10.7KB 8.00KB 16.0KB
cm6080-rtp-01: node.cm6080-rtp-01: 7/29/2013 07:56:33
cpu total data data data cluster cluster cluster disk disk
busy ops nfs-ops cifs-ops busy recv sent busy recv sent read write
---- -------- -------- -------- ---- -------- -------- ------- -------- -------- -------- --------
Minimums:
6% 0 0 0 0% 5.92KB 1.33KB 0% 3.36KB 8.91KB 8.00KB 3.98KB
Averages for 3 samples:
6% 5 0 5 0% 21.9KB 22.5KB 0% 33.1KB 24.0KB 10.6KB 10.6KB
Maximums:
7% 13 0 13 0% 50.6KB 61.3KB 0% 88.5KB 52.5KB 11.9KB 16.0KB
You can also drill down into different counters, objects, etc: These are the available objects at admin level.
cm6080-rtp2::> statistics show-periodic -object
aggregate audit_ng
audit_ng:vserver avoa
avs cifs
cifs:node cifs:vserver
cluster cluster_peer
cpx disk
disk:raid_group ext_cache
ext_cache_obj fcache
fcp_lif fcp_lif:node
fcp_lif:port fcp_lif:vserver
hashd hostadapter
ifnet iscsi_conn
iscsi_conn:session iscsi_lif
iscsi_lif:node iscsi_lif:vserver
lif lif:vserver
logical_replication_destination logical_replication_source
lun lun:constituent
nblade_cifs nfsv3
nfsv3:constituent nfsv3:cpu
nfsv3:node nfsv4
nfsv4:constituent nfsv4:cpu
nfsv4:node nfsv4_1
nfsv4_1:constituent nfsv4_1:cpu
nfsv4_1:node node
path port
processor processor:node
qtree quota
raid rquota
smb1 smb1:node
smb1:vserver smb2
smb2:node smb2:vserver
smtape spinhi
system target_port
target_port:array volume
volume:node volume:vserver
volume_move_summary wafl_hya_per_aggr
workload workload:constituent
workload:policy_group zapi
Counters depend on the object. Once you specify an object, you can find the counters using tab completion:
cm6080-rtp2::> statistics show-periodic -object cifs -instance cm6080-rtp2-01 -counter
active_searches auth_reject_too_many
change_notifications_outstanding cifs_latency
cifs_latency_base cifs_ops
cifs_read_ops cifs_write_ops
commands_outstanding connected_shares
connections established_sessions
instance_name instance_uuid
node_name node_uuid
open_files process_name
signed_sessions vserver_id
vserver_name
Objects like CIFS and NFS can use the vserver as the object:
cm6080-rtp2::> statistics show-periodic -object cifs -instance vserver -counter cifs_latency
cm6080-rtp2: cifs.vserver: 7/29/2013 08:05:07
cifs
latency
--------
0us
0us
cm6080-rtp2: cifs.vserver: 7/29/2013 08:05:12
cifs
latency
--------
Minimums:
0us
Averages for 2 samples:
0us
Maximums:
0us
There are plenty of possibilities for performance monitoring. Leverage tab completion to find different variances.
Also, use the man pages:
cm6080-rtp2::> man statistics show-periodic
statistics show-periodic Data ONTAP 8.2 statistics show-periodic
NAME
statistics show-periodic -- Continuously display current performance data at regular intervals
AVAILABILITY
This command is available to cluster and Vserver administrators at the admin privilege level.
DESCRIPTION
This command continuously displays specified performance data at regular intervals. The command output displays data in the following columns:
o cpu busy: Overall system utilization based on CPU utilization and subsystem utilization. Examples of subsystems include the storage subsystem and RAID subsystem.
o total ops: The number of total operations per second.
o nfs-ops: The number of NFS operations per second.
o cifs-ops: The number of CIFS operations per second.
o data busy: The percentage of time that data ports sent or received data.
o data recv: Network traffic received on data ports (KBps).
o data sent: Network traffic sent on data ports (KBps).
o cluster busy: The percentage of time that cluster ports sent or received data.
o cluster recv: Network traffic received on cluster ports (KBps).
o cluster sent: Network traffic sent on cluster ports (KBps).
o disk read: Data read from disk (KBps).
o disk write: Data written to disk (KBps).
PARAMETERS
-object <text> - Object
Selects the object for which you want to display performance data. The default object is "cluster".
-instance <text> - Instance
Selects the instance for which you want to display performance data. This parameter is required if you specify the -object parameter and enter any object other
than "cluster".
For example, if you want to display disk object statistics, you can use this parameter to specify the name of a specific disk whose statistics you want to view.
(cont)
Hi
Heh, you've got 6080, you probably don't have any problems with performance
Is it possible to print different objects in one output like disk util+iops+latency+cpu util?
Check this great TR. It explains how to track rogue workloads, etc.
http://www.netapp.com/us/media/tr-4211.pdf
Interesting is if you move to advanced mode statistics show-periodic shows more than in admin mode.
Nick
How do I get the counters for these object and more using SNMP in a Cluster Mode Filer?
For example to get the Consistency Point counters like cpFromCpDeferredOps, cpFromCpOps, cpFromFlushOps, etc we can get this information using SNMP using the OIB (1.3.6.1.4.1.789.1.2.6) as described here in 7Mode.
How do I fetch the same information in a Clustered Mode filer?
Is there a MIB tree for a Cluster Mode Filer?
I don't think there is currently a MIB for cDOT for performance counters. I believe the perf data portion is moving toward a ZAPI model, so there may not be a way to capture this via SNMP in the future.