Software Development Kit (SDK) and API Discussions
Software Development Kit (SDK) and API Discussions
We are using the API in Nagios to query disk health and one of the states we query is "is-media-scrubbing". We recently added an AFF8080 to our cluster which has 12 SSDs. 2 of the 12 SSDs are always showing as is-media-scrubbing=true. I am unable to find this value via the CLI. What does this true value mean?
This is the sprintf output for the disk queried from "storage-disk-get-iter".
<storage-disk-info> <disk-inventory-info> <bytes-per-sector>520</bytes-per-sector> <capacity-sectors>30005842608</capacity-sectors> <checksum-compatibility>block</checksum-compatibility> <disk-class>solid-state</disk-class> <disk-cluster-name>1.10.5</disk-cluster-name> <disk-type>SSD</disk-type> <disk-uid>5002538A:4763D270:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000</disk-uid> <firmware-revision>NA01</firmware-revision> <grown-defect-list-count>0</grown-defect-list-count> <health-monitor-time-interval>576094</health-monitor-time-interval> <hw-minimum-os>9.0.0</hw-minimum-os> <import-in-progress>false</import-in-progress> <is-dynamically-qualified>false</is-dynamically-qualified> <is-multidisk-carrier>false</is-multidisk-carrier> <is-shared>true</is-shared> <model>X670_S163315TATE</model> <reservation-key>0x0</reservation-key> <reservation-type>none</reservation-type> <right-size-sectors>30005330432</right-size-sectors> <serial-number>S331NX0J605006</serial-number> <shelf>10</shelf> <shelf-bay>5</shelf-bay> <shelf-uid>6022046091083057744</shelf-uid> <stack-id>1</stack-id> <storage-ssd-info> <percent-rated-life-used>0</percent-rated-life-used> <percent-spares-consumed>12</percent-spares-consumed> </storage-ssd-info> <vendor>NETAPP</vendor> </disk-inventory-info> <disk-metrocluster-info> <is-local-attach>true</is-local-attach> </disk-metrocluster-info> <disk-name>1.10.5</disk-name> <disk-ownership-info> <data1-home>netapp-home06</data1-home> <data1-home-id>537108051</data1-home-id> <data1-owner>netapp-home06</data1-owner> <data1-owner-id>537108051</data1-owner-id> <data2-home>netapp-home05</data2-home> <data2-home-id>537108074</data2-home-id> <data2-owner>netapp-home05</data2-owner> <data2-owner-id>537108074</data2-owner-id> <disk-uid>5002538A:4763D270:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000</disk-uid> <home-node-id>537108051</home-node-id> <home-node-name>netapp-home06</home-node-name> <is-failed>false</is-failed> <owner-node-id>537108051</owner-node-id> <owner-node-name>netapp-home06</owner-node-name> <pool>0</pool> <reserved-by-node-id>0</reserved-by-node-id> <root-home>netapp-home06</root-home> <root-home-id>537108051</root-home-id> <root-owner>netapp-home06</root-owner> <root-owner-id>537108051</root-owner-id> </disk-ownership-info> <disk-paths> <disk-path-info> <array-name>N/A</array-name> <disk-name>netapp-home06:0a.10.5</disk-name> <disk-port>B</disk-port> <disk-port-name>SA:B</disk-port-name> <disk-uid>5002538A:4763D270:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000</disk-uid> <initiator-io-kbps>142</initiator-io-kbps> <initiator-iops>7</initiator-iops> <initiator-lun-in-use-count>0</initiator-lun-in-use-count> <initiator-port>0a</initiator-port> <initiator-port-speed>6</initiator-port-speed> <initiator-side-switch-port>N/A</initiator-side-switch-port> <lun-io-kbps>7</lun-io-kbps> <lun-iops>0</lun-iops> <lun-number>0</lun-number> <lun-path-use-state>INU</lun-path-use-state> <node>netapp-home06</node> <path-io-kbps>0</path-io-kbps> <path-iops>0</path-iops> <path-link-errors>0</path-link-errors> <path-lun-in-use-count>0</path-lun-in-use-count> <path-quality>0</path-quality> <preferred-target-port>false</preferred-target-port> <target-io-kbps>0</target-io-kbps> <target-iops>0</target-iops> <target-lun-in-use-count>0</target-lun-in-use-count> <target-port-access-state>AO</target-port-access-state> <target-side-switch-port>N/A</target-side-switch-port> <target-wwnn>5002538a4763d270</target-wwnn> <target-wwpn>5002538a4763d272</target-wwpn> <tpgn>20</tpgn> </disk-path-info> <disk-path-info> <array-name>N/A</array-name> <disk-name>netapp-home06:4d.10.5</disk-name> <disk-port>A</disk-port> <disk-port-name>SA:A</disk-port-name> <disk-uid>5002538A:4763D270:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000</disk-uid> <initiator-io-kbps>181</initiator-io-kbps> <initiator-iops>9</initiator-iops> <initiator-lun-in-use-count>0</initiator-lun-in-use-count> <initiator-port>4d</initiator-port> <initiator-port-speed>12</initiator-port-speed> <initiator-side-switch-port>N/A</initiator-side-switch-port> <lun-io-kbps>0</lun-io-kbps> <lun-iops>0</lun-iops> <lun-number>0</lun-number> <lun-path-use-state>RDY</lun-path-use-state> <node>netapp-home06</node> <path-io-kbps>0</path-io-kbps> <path-iops>0</path-iops> <path-link-errors>0</path-link-errors> <path-lun-in-use-count>0</path-lun-in-use-count> <path-quality>0</path-quality> <preferred-target-port>false</preferred-target-port> <target-io-kbps>0</target-io-kbps> <target-iops>0</target-iops> <target-lun-in-use-count>0</target-lun-in-use-count> <target-port-access-state>AO</target-port-access-state> <target-side-switch-port>N/A</target-side-switch-port> <target-wwnn>5002538a4763d270</target-wwnn> <target-wwpn>5002538a4763d271</target-wwpn> <tpgn>88</tpgn> </disk-path-info> <disk-path-info> <array-name>N/A</array-name> <disk-name>netapp-home05:0a.10.5</disk-name> <disk-port>A</disk-port> <disk-port-name>SA:A</disk-port-name> <disk-uid>5002538A:4763D270:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000</disk-uid> <initiator-io-kbps>173</initiator-io-kbps> <initiator-iops>7</initiator-iops> <initiator-lun-in-use-count>0</initiator-lun-in-use-count> <initiator-port>0a</initiator-port> <initiator-port-speed>6</initiator-port-speed> <initiator-side-switch-port>N/A</initiator-side-switch-port> <lun-io-kbps>8</lun-io-kbps> <lun-iops>0</lun-iops> <lun-number>0</lun-number> <lun-path-use-state>INU</lun-path-use-state> <node>netapp-home05</node> <path-io-kbps>0</path-io-kbps> <path-iops>0</path-iops> <path-link-errors>0</path-link-errors> <path-lun-in-use-count>0</path-lun-in-use-count> <path-quality>0</path-quality> <preferred-target-port>false</preferred-target-port> <target-io-kbps>0</target-io-kbps> <target-iops>0</target-iops> <target-lun-in-use-count>0</target-lun-in-use-count> <target-port-access-state>AO</target-port-access-state> <target-side-switch-port>N/A</target-side-switch-port> <target-wwnn>5002538a4763d270</target-wwnn> <target-wwpn>5002538a4763d271</target-wwpn> <tpgn>88</tpgn> </disk-path-info> <disk-path-info> <array-name>N/A</array-name> <disk-name>netapp-home05:4d.10.5</disk-name> <disk-port>B</disk-port> <disk-port-name>SA:B</disk-port-name> <disk-uid>5002538A:4763D270:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000</disk-uid> <initiator-io-kbps>218</initiator-io-kbps> <initiator-iops>10</initiator-iops> <initiator-lun-in-use-count>0</initiator-lun-in-use-count> <initiator-port>4d</initiator-port> <initiator-port-speed>12</initiator-port-speed> <initiator-side-switch-port>N/A</initiator-side-switch-port> <lun-io-kbps>0</lun-io-kbps> <lun-iops>0</lun-iops> <lun-number>0</lun-number> <lun-path-use-state>RDY</lun-path-use-state> <node>netapp-home05</node> <path-io-kbps>0</path-io-kbps> <path-iops>0</path-iops> <path-link-errors>0</path-link-errors> <path-lun-in-use-count>0</path-lun-in-use-count> <path-quality>0</path-quality> <preferred-target-port>false</preferred-target-port> <target-io-kbps>0</target-io-kbps> <target-iops>0</target-iops> <target-lun-in-use-count>0</target-lun-in-use-count> <target-port-access-state>AO</target-port-access-state> <target-side-switch-port>N/A</target-side-switch-port> <target-wwnn>5002538a4763d270</target-wwnn> <target-wwpn>5002538a4763d272</target-wwpn> <tpgn>20</tpgn> </disk-path-info> </disk-paths> <disk-raid-info> <active-node-name>netapp-home06</active-node-name> <container-type>shared</container-type> <disk-aggregate-info> <checksum-type>none</checksum-type> <copy-percent-complete>0</copy-percent-complete> <is-media-scrubbing>true</is-media-scrubbing> <is-offline>false</is-offline> <is-prefailed>false</is-prefailed> <is-reconstructing>false</is-reconstructing> <is-replacing>false</is-replacing> <is-zeroed>true</is-zeroed> <is-zeroing>false</is-zeroing> <reconstruct-percent-complete>0</reconstruct-percent-complete> </disk-aggregate-info> <disk-shared-info> <aggregate-list> <shared-aggregate-info> <aggregate-name>netapp_home05_SSD_1</aggregate-name> </shared-aggregate-info> <shared-aggregate-info> <aggregate-name>netapp_home06_SSD_1</aggregate-name> </shared-aggregate-info> </aggregate-list> <checksum-type>none</checksum-type> <copy-percent-complete>0</copy-percent-complete> <is-media-scrubbing>true</is-media-scrubbing> <is-offline>false</is-offline> <is-prefailed>false</is-prefailed> <is-reconstructing>false</is-reconstructing> <is-replacing>false</is-replacing> <is-sparecore>false</is-sparecore> <is-zeroed>true</is-zeroed> <is-zeroing>false</is-zeroing> <partitioning-type>root_data1_data2</partitioning-type> <reconstruct-percent-complete>0</reconstruct-percent-complete> </disk-shared-info> <disk-spare-info> <is-sparecore>false</is-sparecore> </disk-spare-info> <disk-uid>5002538A:4763D270:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000</disk-uid> <effective-disk-type>SSD</effective-disk-type> <physical-blocks>3750730326</physical-blocks> <position>shared</position> <spare-pool>Pool0</spare-pool> <used-blocks>3750666304</used-blocks> </disk-raid-info> <disk-stats-info> <average-latency>0</average-latency> <bytes-per-sector>520</bytes-per-sector> <disk-io-kbps>15</disk-io-kbps> <disk-iops>0</disk-iops> <disk-uid>5002538A:4763D270:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000</disk-uid> <path-error-count>0</path-error-count> <power-on-time-interval>633600</power-on-time-interval> <sectors-read>70094630</sectors-read> <sectors-written>1593746</sectors-written> </disk-stats-info> <disk-uid>5002538A:4763D270:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000</disk-uid> </storage-disk-info>
Is the RAID group doing a media scrub? Use the command "storage aggregate show-scrub-status" from the CLI (or PowerShell cmdlet "Get-NcAggrScrub").
Andrew
I do not believe a scrub is running. The API status is showing is-media-scrubbing=true since we installed the AFF8080, never seems to go to false. Our FAS8040 never showed as true. The AFF8080 was an addition to our FAS8040 cluster.
netapp-home::> storage aggregate show-scrub-status Complete Aggregate RAID Groups Suspended Percentage Last Scrub Time --------- ------------------------- ---------- ---------- -------------------- Info: Failed to check the scrubbing status on aggr0_netapp_home_01_0 "Operation not supported on that object type". Reason: . aggr0_netapp_home_01_0 - - - /aggr0_netapp_home_ 01_0/plex0/rg0 Info: Failed to check the scrubbing status on aggr0_netapp_home_02_0 "Operation not supported on that object type". Reason: . aggr0_netapp_home_02_0 - - - /aggr0_netapp_home_ 02_0/plex0/rg0 Info: Failed to check the scrubbing status on aggr0_netapp_home_03_0 "Operation not supported on that object type". Reason: . aggr0_netapp_home_03_0 - - - /aggr0_netapp_home_ 03_0/plex0/rg0 Info: Failed to check the scrubbing status on aggr0_netapp_home_04_0 "Operation not supported on that object type". Reason: . aggr0_netapp_home_04_0 - - - /aggr0_netapp_home_ 04_0/plex0/rg0 aggr0_netapp_home_05_0 false - 10/22/2017 02:39:18 /aggr0_netapp_home_ 05_0/plex0/rg0 aggr0_netapp_home_06_0 false - 10/22/2017 02:36:46 /aggr0_netapp_home_ 06_0/plex0/rg0 Info: Failed to check the scrubbing status on aggr_netapp_home01_array_sas_4t_1 "Operation not supported on that object type". Reason: . aggr_netapp_home01_array_sas_4t_1 - - - /aggr_netapp_home01_ array_sas_4t_1/plex0/ rg0 aggr_netapp_home01_array_sas_4t_1 - - - /aggr_netapp_home01_ array_sas_4t_1/plex0/ rg1 aggr_netapp_home01_array_sas_4t_1 - - - /aggr_netapp_home01_ array_sas_4t_1/plex0/ rg2 Info: Failed to check the scrubbing status on aggr_netapp_home02_array_sas_4t_1 "Operation not supported on that object type". Reason: . aggr_netapp_home02_array_sas_4t_1 - - - /aggr_netapp_home02_ array_sas_4t_1/plex0/ rg0 aggr_netapp_home02_array_sas_4t_1 - - - /aggr_netapp_home02_ array_sas_4t_1/plex0/ rg1 aggr_netapp_home02_array_sas_4t_1 - - - /aggr_netapp_home02_ array_sas_4t_1/plex0/ rg2 Info: Failed to check the scrubbing status on aggr_netapp_home03_array_sas_4t_1 "Operation not supported on that object type". Reason: . aggr_netapp_home03_array_sas_4t_1 - - - /aggr_netapp_home03_ array_sas_4t_1/plex0/ rg0 aggr_netapp_home03_array_sas_4t_1 - - - /aggr_netapp_home03_ array_sas_4t_1/plex0/ rg1 aggr_netapp_home03_array_sas_4t_1 - - - /aggr_netapp_home03_ array_sas_4t_1/plex0/ rg2 Info: Failed to check the scrubbing status on aggr_netapp_home04_array_sas_4t_1 "Operation not supported on that object type". Reason: . aggr_netapp_home04_array_sas_4t_1 - - - /aggr_netapp_home04_ array_sas_4t_1/plex0/ rg0 aggr_netapp_home04_array_sas_4t_1 - - - /aggr_netapp_home04_ array_sas_4t_1/plex0/ rg1 aggr_netapp_home04_array_sas_4t_1 - - - /aggr_netapp_home04_ array_sas_4t_1/plex0/ rg2 netapp_home05_SSD_1 false - 10/22/2017 04:00:05 /netapp_home05_SSD_1/ plex0/rg0 netapp_home06_SSD_1 false - 10/22/2017 03:54:21 /netapp_home06_SSD_1/ 20 entries were displayed.
We are observing the same problem on two systems after an upgrade from 8.3.2P? to 9.1P11.
In our situation, after digging a little deeper into the problem, it looks like all spare disks on the cluster are having the <is-media-scrubbing> property set like this:
<disk-raid-info>
<active-node-name>stgb-pidpa-n01</active-node-name>
<container-type>spare</container-type>
<disk-shared-info>
<is-sparecore>false</is-sparecore>
</disk-shared-info>
<disk-spare-info>
<is-media-scrubbing>true</is-media-scrubbing>
<is-offline>false</is-offline>
<is-sparecore>false</is-sparecore>
<is-zeroed>true</is-zeroed>
<is-zeroing>false</is-zeroing>
</disk-spare-info>
<disk-uid>5002538A:07219DE0:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000</disk-uid>
<effective-disk-type>SSD</effective-disk-type>
<physical-blocks>97677846</physical-blocks>
<position>present</position>
<spare-pool>Pool0</spare-pool>
<used-blocks>97613824</used-blocks>
</disk-raid-info>
and some data disks are reporting media scrubbing like this:
<disk-raid-info>
<active-node-name>stgb-pidpa-n01</active-node-name>
<container-type>shared</container-type>
<disk-aggregate-info>
<checksum-type>none</checksum-type>
<copy-percent-complete>0</copy-percent-complete>
<is-media-scrubbing>true</is-media-scrubbing>
<is-offline>false</is-offline>
<is-prefailed>false</is-prefailed>
<is-reconstructing>false</is-reconstructing>
<is-replacing>false</is-replacing>
<is-zeroed>true</is-zeroed>
<is-zeroing>false</is-zeroing>
<reconstruct-percent-complete>0</reconstruct-percent-complete>
</disk-aggregate-info>
<disk-shared-info>
<aggregate-list>
<shared-aggregate-info>
<aggregate-name>n01_sas_450g_01_hybrid</aggregate-name>
</shared-aggregate-info>
<shared-aggregate-info>
<aggregate-name>n02_sas_450g_01_hybrid</aggregate-name>
</shared-aggregate-info>
</aggregate-list>
<checksum-type>none</checksum-type>
<copy-percent-complete>0</copy-percent-complete>
<is-media-scrubbing>true</is-media-scrubbing>
<is-offline>false</is-offline>
<is-prefailed>false</is-prefailed>
<is-reconstructing>false</is-reconstructing>
<is-replacing>false</is-replacing>
<is-sparecore>false</is-sparecore>
<is-zeroed>true</is-zeroed>
<is-zeroing>false</is-zeroing>
<partitioning-type>storage_pool</partitioning-type>
<reconstruct-percent-complete>0</reconstruct-percent-complete>
<storage-pool>sp1</storage-pool>
</disk-shared-info>
<disk-spare-info>
<is-sparecore>false</is-sparecore>
</disk-spare-info>
<disk-uid>5002538A:07219DD0:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000</disk-uid>
<effective-disk-type>SSD</effective-disk-type>
<physical-blocks>97677846</physical-blocks>
<position>shared</position>
<spare-pool>3f195998-6159-11e7-83a4-00a098647eeb</spare-pool>
<used-blocks>97613824</used-blocks>
</disk-raid-info>
Now, all those disks, in our situation, are ADP partitioned disks that have at least one partition that is not used and in fact uased as spare, eg.
Pool0 spare disks
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block checksum
spare 3a.40.23 3a 40 23 SA:A 0 SAS 10000 857000/1755136000 858483/1758174768
spare 3d.43.23 3d 43 23 SA:B 0 SAS 10000 857000/1755136000 858483/1758174768
spare 3d.44.9 3d 44 9 SA:B 0 SAS 10000 857000/1755136000 858483/1758174768
spare 3d.44.23 3d 44 23 SA:B 0 SAS 10000 857000/1755136000 858483/1758174768
spare 0a.03.7 0a 3 7 SA:A 0 SAS 15000 418000/856064000 420156/860480768
spare 1b.04.17 1b 4 17 SA:B 0 SAS 15000 418000/856064000 420584/861357448
spare 1b.04.23 1b 4 23 SA:B 0 SAS 15000 418000/856064000 420584/861357448
spare 1b.05.23 1b 5 23 SA:B 0 SAS 15000 418000/856064000 420584/861357448
spare 1b.06.23 1b 6 23 SA:B 0 SAS 15000 418000/856064000 420584/861357448
spare 0d.30.0P1 0d 30 0 SA:B 0 SSD N/A 95312/195200512 95320/195216896
spare 0d.30.2P1 0d 30 2 SA:B 0 SSD N/A 95312/195200512 95320/195216896
spare 0d.30.4P1 0d 30 4 SA:B 0 SSD N/A 95312/195200512 95320/195216896
spare 0d.30.6P1 0d 30 6 SA:B 0 SSD N/A 95312/195200512 95320/195216896
spare 0d.30.8P1 0d 30 8 SA:B 0 SSD N/A 95312/195200512 95320/195216896
spare 0d.30.10P1 0d 30 10 SA:B 0 SSD N/A 95312/195200512 95320/195216896
spare 1c.30.1P1 1c 30 1 SA:A 0 SSD N/A 95312/195200512 95320/195216896
spare 1c.30.3P1 1c 30 3 SA:A 0 SSD N/A 95312/195200512 95320/195216896
spare 1c.30.5P1 1c 30 5 SA:A 0 SSD N/A 95312/195200512 95320/195216896
spare 1c.30.7P1 1c 30 7 SA:A 0 SSD N/A 95312/195200512 95320/195216896
spare 1c.30.9P1 1c 30 9 SA:A 0 SSD N/A 95312/195200512 95320/195216896
spare 1c.30.11 1c 30 11 SA:A 0 SSD N/A 381304/780910592 381554/781422768
All those disks are "scrubbing" even the disks in shelf 30 whose partition 2 is actively in use.
We have a case open with support (2007380388) to look a little deeper into this. I will report back as soon as something comes out of this.
Best regards!
Forgot to add, if you are using a Nagios plugin such as check_netapp_ontap.pl it will look specifically for this info:
if (defined($nahDisk->child_get("disk-raid-info")->child_get("disk-aggregate-info"))) {
$hshDiskInfo{$strDiskName}{'scrubbing'} = $nahDisk->child_get("disk-raid-info")->child_get("disk-aggregate-info")->child_get_string("is-media-scrubbing");
Because it is looking specifically under disk-aggregate-info, those disks will pop up as scrubbing when for instance spare disks where <is-media-scrubbing> is sitting under <disk-spare-info> and this Nagios plugin doesn't check this.
Interested to know where you got with this? We have a similar issue with our monitoring.
Hi,
Very interesting discussions. It's not my area of expertise but I read something interesting and thought of sharing.
Could it be that the API <is-media-scrubbing> is refering to 'Media scrubbing' (continous background process). If so, then it will come back 'true'. Isn't it.
Media scrubbing = is a "continuous background process". Therefore, you might observe disk LEDs blinking on an apparently idle storage system. You might also observe some CPU activity even when no user workload is present.
RAID-level scrub operation = is at the RAID-level (scheduled), this type of scrub finds and corrects parity and checksum errors as well as media errors.
This is an old NetApp KB (What does raid.media_scrub.enable option do) marked for 7-mode:
https://kb.netapp.com/app/answers/answer_view/a_id/1004900
However it menions the registry which is still inherited in cDOT/ONTAP, an example below:
It says : "This option enables continuous media scrubbing on the NetApp filer."
On my FAS8200/FAS8020/AFF300 | ONTAP 9.1 , it is ON
1) raid.media_scrub.enable on = continuous media scrubbing disk@Aggr
2) raid.scrub.enable on = This option specifies the weekly schedule @RAID
These registry option are only visible from diag mode :
::>set diag
::>options raid*
I don't have FAS8040, but could it be that on those systems it was turned off at filer level.
Thanks!