ONTAP Discussions

One (supposed-to-be) spare disk is in a weird state

MarkBartelt
2,778 Views

We have an old NetApp which is no longer covered by

a support contract, so I'm hopeful that one of you

might be able to answer my question.  If this would

be more appropriate for one of the other discussion

areas, just let me know and I'll re-post it.

 

Anyway ...  Our nagios server recently began sending

us alerts about our NetApp; specifically ...

 

        SNMP WARNING - Disk_Spare_Count *1*

 

So I did a "sysconfig -r" on the NetApp.  It reported
two RAID groups (each with thirteen disks, eleven of
which were labeled "data", one as "parity", and one
as "dparity").  No surprise there.

 

Then there was one disk in the "Spare disks" category,
with "spare" in the first column.

 

And the remaining disk was in a "Maintenance disks"
category, and the first column says "testing".

 

Testing what, exactly?  And how long does whatever
it's doing typically take?  (It's been that way for
several days now.)

 

In short, what do we need to do to get the number of
spare disks back to what it was?

 

We do have an extra disk sitting in a cabinet, which

was purchased so that we'd have one on site in case

one of the NetApp disks died.

 

Should be replace the disk reported as in a "testing"

state with that one?  I.e. does "testing" really mean

"failed"?

 

And if we do replace it with our spare from the cabinet,

what needs to be done so that the NetApp will add it to

its "Spare disks" category?

 

Thanks in advance!

 

1 REPLY 1

Damien_Queen
2,759 Views

1) ONTAP systems have a built-in functionality called Maintenance Center. By default, if you have more than one spare drive and one of your HDD failed Maintenance center instead of just marking your drive as failed trying to test & recover that drive first;

 

2) Now, testing & recovering process time depends on the type of your drive & capacity. I assume that testing should take no longer than recovery (not 100% sure on that) here is time examples of data reconstruction depending on drive type & capacity;

 

3) You can manipulate warnings about min number of spare drives with raid.min_spare_count:

option raid.min_spare_count

 

 4) To add a new spare drive you need to have a new or used hard drive:

  • HDD with the same type with same or bigger space on it;
  • This drive must contain WAFL labels of the same ONTAP mode (for instance drives been in Cluster-Mode systems cannot be used in the 7-Mode system);
  • The version of WAFL labels must be no higher than the current system runs;
  • FW on the drive recommended being no higher than current system supports.

 

5) After you installed a new disk drive to your disk shelf:

  • Change the ownership of the drive with disk assign -s id-of-your-system disk.name.here
  • Update FW to the latest supported for your system if needed
  • Make sure it is zeroed disk zero spares

 

Public