Finally resolved after several hours with support.
The first support engineer didn’t seem very knowledgeable and we fumbled about quite a bit trying various things.
Eventually we managed to switch the problem from node7 over to node8. Haha/sigh....
He brought in Jim W, a senior engineer who was able to sort it out.
Jim’s analysis is that ONTAP is functioning correctly, but it looks like UM has a quirk because it wants to see an entire spare disk assigned to each node, not shared, which we originally had before the disk failure.
Jim’s summary:
As discussed, in this HA pair, nodes CLUS01-N7 / CLUS01-N8; there was (1) whole spare owned by node -08 and (1) Partitioned spare (with the container owned by node -07) distributed across the HA pair. This was causing OnCommand UM to report a SPARES LOW event.
As the nodes have a mix of whole disk RAID groups and partitioned RAID groups; ONTAP requires (2) whole spares across the HA pair, (1) that is maintained as whole and (1) that is partitioned to be available as required in the event if a whole or container disk failure.
The OnCommand UM Spares low message that was seen when the system was cleared as described by unpartitioning the whole disk 0c.11.17 (owned by node -07) with the steps as shown.
From the cluster shell:
cluster_CLI::> storage disk option modify -node CLUS01-N7 -autoassign off
cluster_CLI::> storage disk option modify -node CLUS01-N8 -autoassign off
Then from the Node shells of the nodes
CLUS01-N8> priv set diag
CLUS01-N8*> disk assign 0c.11.17P1 -s unowned -f
CLUS01-N7> priv set diag
CLUS01-N7*> disk assign all
CLUS01-N7*> disk unpartition 0c.11.17
which returned the 0c.11.17 disk back as a whole spare on node -07.
Note that in the event of a container disk failure, the whole disk on either node can be auto-partitioned by ONTAP to be used as required.
Additionally, the auto-partitioning of the replaced disk 0c.11.17 is a normal ONTAP RAID_LM (RAID Layout Manager) subsystem operation and is not required to be altered as we did in order to alleviate the OCUM reporting.
Should the issue persist with OCUM, I would recommend opening a new case specifically for the version of OCUM that you are using and let that respective team investigate it accordingly.
fyi - we are currently using UM v.9.7P1