ONTAP Discussions

Matching System Manager Performance Metrics with statistics CLI Counters for NVMe Namespace

vemus
101 Views

Hi everyone,

In ONTAP System Manager, we're monitoring performance metrics like IOPS, Throughput, and Latency for NVMe namespaces using the namespace performance report. We're trying to understand which counters from the  statistics CLI (specifically under the namespace object) correspond to these metrics.

Here’s the command we’re using to view available counters

ontap-select-cluster::*> statistics catalog counter show -object namespace

Object: namespace
    Counter                     Description
    --------------------------- ----------------------------------------------
    abort_request               Number of Abort requests per second.
    ana_aen_queued              Number of ANA AEN Queued due to ANA
                                Transistion error.
    ana_inaccess_errs           Number of ANA Inaccessible errors returned
                                per second.
    ana_transition_errs         Number of ANA transition errors returned per
                                second.
    avg_latency                 Average latency in microseconds for all
                                operations on the Namespace
    bii_cache_updates           Number of Blocks Interblade Interface (BII)
                                cache updates triggered
    caw_data                    Compare and Write bytes
    caw_ops                     Number of compare and write operations
    compare_data                Compare bytes
    compare_errs                Number of Compare Errors returned per second.
    compare_ops                 Number of compare operations
    create_sge_internal_err     Number of non-fatal internal errors related
                                to creation of SGEs using extents
    dealloc_data                Total bytes deallocated
    dealloc_data_lt_1mb         Bytes Deallocated by Dealloc requests that
                                are upto 1MB large
    dealloc_data_lt_2mb         Bytes Deallocated by Dealloc requests that
                                are 1-2MB large
    dealloc_data_lt_32mb        Bytes Deallocated by Dealloc requests that
                                are 2-32MB large
    dealloc_data_lt_64mb        Bytes Deallocated by Dealloc requests that
                                are 32-64MB large
    dealloc_msg_alloc_failures  Number of DSM Dealloc commands that
                                encountered WAFL message allocation failures
    dealloc_ops                 Number of DSM Dealloc operations
    dealloc_ops_lt_1mb          Number of DSM Dealloc operations that are
                                uptp 1MB large
    dealloc_ops_lt_2mb          Number of DSM Dealloc operations that are
                                1-2MB large
    dealloc_ops_lt_32mb         Number of DSM Dealloc operations that are
                                2-32MB large
    dealloc_ops_lt_64mb         Number of DSM Dealloc operations that are
                                32-64MB large
    dealloc_range_validation_errors
                                Number of DSM Dealloc commands that had
                                invalid ranges
    dealloc_rcvd_ranges_crossing_wafl_stripes
                                Number of DSM Dealloc ranges sent by host
                                that crossed WAFL stripe boundary
    dealloc_rcvd_ranges_gt_max_range_size
                                Number of DSM Dealloc ranges sent by host
                                that are larger than the DSM range limit
                                published by ONTAP NVMf Target
    dealloc_rcvd_reqs_gt_max_dealloc_size
                                Number of DSM Dealloc requests sent by host
                                that are larger than the total DSM size limit
                                published by ONTAP NVMf Target
    dealloc_requested           Total bytes requested to be deallocated
    identify_ns_nuse_gt_nsze    Number of WAFL operations which report used
                                NS size as greater than NS size
    instance_name               Aggregated namespace path
    instance_uuid               Aggregated namespace vdisk ID
    internal_err                Number of non-fatal internal errors
    large_read_data             Read bytes from large IO
    large_read_ops              Number of read operations with a request size
                                greater than 64k
    large_write_data            Write bytes from large IO
    large_write_ops             Number of write operations with a request
                                size greater than 64k
    node_name                   System node name
    node_uuid                   System node id
    ns_resize_grow              Size of the Namespace is increased.
    ns_resize_shrink            Size of the Namespace is reduced.
    ns_resize_zero_err          Attempt is made to resize a namespace to 0
                                bytes
    other_errs                  Number of Other Errors returned per second.
    other_ops                   Number of other operations
    process_name                Ontap process that provided this instance
    read_data                   Read bytes
    read_ops                    Number of read operations
    read_size_hist              Histogram of read sizes
    remote_abort_request        Number of Remote Aborts requests per second.
    remote_ana_inaccess_errs    Number of Remote ANA Inaccessible Errors
                                returned per second.
    remote_ana_transition_errs  Number of Remote ANA transition Errors
                                returned per second.
    remote_avg_latency          Average latency in microseconds for all
                                remote operations on the Namespace
    remote_caw_data             Remote compare and Write bytes
    remote_caw_ops              Number of remote compare and write operations
    remote_compare_data         Remote compare bytes
    remote_compare_errs         Number of Remote Compare Errors returned per
                                second.
    remote_compare_ops          Number of remote compare operations
    remote_dealloc_data         Remote Dealloc bytes
    remote_dealloc_data_lt_1mb  Bytes remotely Deallocated by Dealloc
                                requests that are upto 1MB large
    remote_dealloc_data_lt_2mb  Bytes remotely Deallocated by Dealloc
                                requests that are 1-2MB large
    remote_dealloc_data_lt_32mb Bytes remotely Deallocated by Dealloc
                                requests that are 2-32MB large
    remote_dealloc_data_lt_64mb Bytes Remotely Deallocated by Dealloc
                                requests that are 32-64MB large
    remote_dealloc_ops          Number of Remote DSM Dealloc operations
    remote_dealloc_ops_lt_1mb   Number of remote DSM Dealloc operations that
                                are uptp 1MB large
    remote_dealloc_ops_lt_2mb   Number of remote DSM Dealloc operations that
                                are 1-2MB large
    remote_dealloc_ops_lt_32mb  Number of remote DSM Dealloc operations that
                                are 2-32MB large
    remote_dealloc_ops_lt_64mb  Number of remote DSM Dealloc operations that
                                are 32-64MB large
    remote_dealloc_requested    Total remote bytes requested to be deallocated
    remote_handling_latency     Average total latency in microseconds for all
                                remote operations
    remote_identify_ns_nuse_gt_nsze
                                Number of remote WAFL operations that report
                                used NS size as greater than NS size
    remote_large_read_data      Remote read bytes from large IO
    remote_large_read_ops       Number of remote read operations with a
                                request size greater than 64k
    remote_large_write_data     Remote write bytes from large IO
    remote_large_write_ops      Number of remote write operations with a
                                request size greater than 64k
    remote_other_errs           Number of Other Remote Errors returned per
                                second.
    remote_other_ops            Number of remote other operations
    remote_read_data            Remote read bytes
    remote_read_ops             Number of remote read operations
    remote_read_size_hist       Histogram of remote read sizes
    remote_response_ion_io_poller_latency
                                Average time spent in client IO poller queue
                                at ION by remote responses.
    remote_server_caw_ops_pending
                                Number of remote server caw operations
                                pending in wafl
    remote_server_caw_ops_responded
                                Number of remote server caw operations
                                responded to client
    remote_server_cmds_aborted  Number of remote server commands aborted by
                                client
    remote_server_cmds_not_found_on_abort
                                Number of remote server commands not found
                                while trying to abort
    remote_server_getattr_ops_pending
                                Number of remote getattr operations pending
                                in WAFL
    remote_server_r_ops_pending Number of remote server read operations
                                pending in wafl
    remote_server_r_ops_responded
                                Number of remote server read operations
                                responsded to client
    remote_server_w_ops_pending Number of remote server write operations
                                pending in wafl
    remote_server_w_ops_responded
                                Number of remote server write operations
                                responded to client
    remote_total_ops            Total number of remote operations on the
                                Namespace
    remote_wafl_abts_rcv        Number of remote WAFL operations that were
                                requested to abort
    remote_wafl_del_ops         Number of remote WAFL operations that failed
                                after namespace delete/unmap
    remote_wafl_getattr_err     Number of remote WAFL getattr operations
                                which returned error
    remote_wafl_getattr_ops     Number of remote WAFL getattr operations
    remote_wafl_latency         Average wafl latency for remote operations on
                                NSON.
    remote_write_data           Remote write bytes
    remote_write_ops            Number of remote write operations
    remote_write_size_hist      Histogram of remote write sizes
    sgl_internal_err            Number of non-fatal internal errors related
                                to scatter gather
    subsystem_name              Name of the subsystem owning this namespace
    subsystem_uuid              UUID of the subsystem owning this namespace
    total_ops                   Total number of operations on the Namespace
    vserver_name                Name of the vserver owning this namespace
    vserver_uuid                UUID of the vserver owning this namespace
    wafl_aborts_rcv             Number of WAFL operations that were aborted
    wafl_aborts_req             Number of WAFL operations that were requested
                                to abort
    wafl_delete_ops             Number of WAFL operations that failed after
                                namespace delete/unmap
    wafl_getattr_err            Number of WAFL getattr operations which
                                returned error
    wafl_getattr_ops            Number of local WAFL getattr operations
    wafl_rw_errs                Number of WAFL read/write/compare operations
                                that returned an error
    wafl_test_ops               Number of WAFL operations effected by a test
                                operation
    write_data                  Write bytes
    write_ops                   Number of write operations
    write_size_hist             Histogram of write sizes
113 entries were displayed.
 

From the output, it's clear that the namespace object provides a lot of detailed metrics. Could someone help map the System Manager metrics to their equivalent statistics counters?

So far, I'm assuming the following:

  • IOPS = read_ops + write_ops + other_ops (or total_ops)

  • Throughput = read_data + write_data

  • Latency = avg_latency

Are these correct? And are there any  additional counters we should be considering—especially for remote operations or large I/O?

Any guidance would be much appreciated!

0 REPLIES 0
Public