DDP pool two disks fail

stevenzhang · ‎2020-01-17

Hello experts

When two disks fail at the same time, are status of volumes RW(read and write)? or the volumes change to read-only?

Ontapforrum · ‎2020-01-18

Hi,

I haven't worked with it before, but this TR explainstwo disks fails scenario.

Multiple Drive Failures: (section 2.3, page 😎
https://www.netapp.com/us/media/tr-4652.pdf

To minimize data availability risk, if multiple drives fail within a pool, any D-stripes that are missing two Dpieces are given priority for reconstruction. This approach is called critical reconstruction. After critically affected D-stripes are reconstructed, the rest of the necessary data is then reconstructed.

For very large pools with two simultaneous disk failures, only a relatively small number of D-stripes are likely to encounter the critical situation in which two D-pieces must be reconstructed. As discussed previously, these critical D-pieces are identified and reconstructed initially at the highest priority. This approach returns the pool to a degraded state very quickly so that further drive failures can be tolerated.

As an example, assume that a pool of 192 drives has been created and has two drive failures. In this case, it is likely that the critical D-pieces would be reconstructed in less than one minute and, after that minute, an additional drive failure could be tolerated.

A major benefit of DDP technology is that, rather than using dedicated stranded hot spares, the pool itself contains integrated preservation capacity to provide rebuild locations for potential drive failures. This feature simplifies management, because you no longer have to plan or manage individual hot spares. It also greatly improves the time of rebuilds and enhances the performance of the volumes themselves during a rebuild.

As I understand from the document : Re-con is superfast in DDP, and there is no concept of hot-spares, you basically expand the pool that's it. If drives fails (considering 2) it automatically choses the appropriate drive(s) with-in the pool to write the re-constructed data. As with any other degraded state : All the reads might have a IO delay due to being re-constructed before serviced to the front-end clients, but writes should go through. Again it all depends upon your DDP architure and the current state.

Wrt degraded state (From RAID-DP,RAID-4 experince, in general it applies to all) : Degraded states does not cause volume to read-only. Volumes continues to be read-write as it was before. Only change is that the - Data requested for the failed disks is reconstructed and then served, therefore 'reads' can have a IO delay impact while write should go through as usual.

As I said, I haven't come across DPP yet but the document contains useful info.

Thanks!