Object Storage

SG5660 3 failed drives

iamsam
3,937 Views

Hello

 

My grid  setup consists of 2 sites, and the ILM is "Make 2 Copies 2 Sites" to save 1 copy in each site.

One of my SG5660 nodes has 3 failed drives, AFAIK this is the maximum number of failed drives node can survive.

Replacing the failed drives is not possible due to expired entitlement. Renewing the entitlement will take sometime.

 

In case 1 more disk fails on that node, what will happen? Requests to objects on this node will fail? What can I do to mitigate the risk on this node ?

1 ACCEPTED SOLUTION

Ontapforrum
3,863 Views

Thanks for sharing the screenshot.

 

I am reading /learning about the 'Preservation capacity' concept from the following kb. There is a definition about Preservation capacity inside the following kb and that tells me that your situation is not too bad.

 

Basically, you still have a 'Preservation Capacity' of '3' Drives available, intact. All the available free capacity with-in the disk-pool has been used to re-construct the data for the failed drives so far and therefore it is now reporting zero. If any further 'drive' fails, then it will use the disk from the preservation capacity.


Recovery Guru reporting "Pool Preservation Capacity in Use" after drive replacement
https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/E-Series_SANtricity_Software_Suite/Recovery_Guru_reporting_%22Pool_Preservation...

 

I don't know the Physical disk-pool DRIVE COUNT  in your SG, but here is a good kb for information purpose.
https://kb.netapp.com/Advice_and_Troubleshooting/Data_Infrastructure_Management/E-Series_SANtricity_Management_Software/How_many_drives_should_be_assi...

 

View solution in original post

3 REPLIES 3

Ontapforrum
3,902 Views

I am not a ESERIES/StorageGRID guy. But, as I read, it looks like - ESERIES/StorageGRID uses 'DDP' (default and recommended setting). DDP mode provides more efficient recovery from drive failures. 3 Failed drive (as you mentioned) concept is for RAID-TEC and 2 for RAID-6 which I doubt applies to your case?

 

Operation when a (multiple)drive fails:
A major benefit of DDP technology is that, rather than using dedicated stranded hot spares, the pool itself
contains integrated preservation capacity to provide rebuild locations for potential drive failures. This
feature simplifies management, because you no longer have to plan or manage individual hot spares. It
also greatly improves the time of rebuilds and enhances the performance of the volumes themselves
during a rebuild.

 

A large pool can continue to maintain multiple sequential failures without data loss until there is no additional preservation capacity to continue the rebuilds.

 

I don't know how big your POOL is, but ideally it will reconstruct the failed-drive data onto the rest of the DRIVEs in the pool. Only concern is - Overall free capacity will drop. However as long as the POOL has enough free capacity it will rebalance/re-distribute itself with in the remaining drives in the POOL and will continue to serve data. I am assuming once the failed drives are replaced, POOL can be expanded to increase the free capacity. Anyway, hope this helps to reduce some of your worries but you need to keep an eye on the free capacity of the pool.

 

Useful thread:
https://community.netapp.com/t5/EF-E-Series-SANtricity-and-Related-Plug-ins/Need-to-remove-failed-disk-from-pool/m-p/436667#:~:text=When%20a%20disk%20....

 

Dynamic Disk Pools:
https://www.netapp.com/pdf.html?item=/media/12421-tr4652.pdf

 


It would be helpful if you could share details such as DDP POOL size and free capacity etc. so that someone from Storage GRID can help you further (screenshots will be helpful).

iamsam
3,893 Views

Thanks for the provided information.
Free Capacity shows 0. Preserved Capacity = 3 Drives.

Ontapforrum
3,864 Views

Thanks for sharing the screenshot.

 

I am reading /learning about the 'Preservation capacity' concept from the following kb. There is a definition about Preservation capacity inside the following kb and that tells me that your situation is not too bad.

 

Basically, you still have a 'Preservation Capacity' of '3' Drives available, intact. All the available free capacity with-in the disk-pool has been used to re-construct the data for the failed drives so far and therefore it is now reporting zero. If any further 'drive' fails, then it will use the disk from the preservation capacity.


Recovery Guru reporting "Pool Preservation Capacity in Use" after drive replacement
https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/E-Series_SANtricity_Software_Suite/Recovery_Guru_reporting_%22Pool_Preservation...

 

I don't know the Physical disk-pool DRIVE COUNT  in your SG, but here is a good kb for information purpose.
https://kb.netapp.com/Advice_and_Troubleshooting/Data_Infrastructure_Management/E-Series_SANtricity_Management_Software/How_many_drives_should_be_assi...

 

Public