I have a pretty basic question that I cannot find an answer for in the knowledge base. Hopefully someone here can help!
First some background: we have a four node AFF8080 cluster. Unfortunately, the pro services company that set us up only configured one spare disk per node. The oldest of the systems has been running for four years and we just had our first failed disk early this morning (which is impressive, but that's another story). When the disk was in pre-fail state and copying to the spare, we got the low spares warning from Unified Manager and I got a call.
I am now waiting for the disk to arrive so I can replace the failed disk. In the meantime I looked around in the knowledge base to be sure I would know what to expect on the off-chance that a second disk failed before I could replace the one that had already failed. (While it seems unlikely, I have seen cases where suddenly multiple disks fail on a system).
I found the KB article shown below for 7-mode systems, but absolutely nothing for cluster mode. I spoke to a NetApp tech who was running the failed disk case but they just pointed me to other 7-mode articles, or a couple of cluster mode articles that didn't fit my scenario. Specifically: when you have 0 spare disks, what happens when a disk fails IN CLUSTER MODE? Can anyone shed light on the answer to this question and preferably point me to a KB article? Thanks so much!
Thank you both for very helpful replies! I didn't even think to look in the System Manager guide.
From what I've read, it appears that two spares per node would be a better configuration, but if we were to lose two disks at the same time we wouldn't experience an outage, at least not for 24 hours. Having said that, it appears that there would likely be a performance hit until one or both failed disks were replaced. Of course the disks would be replaced well before 24 hours had passed so I wouldn't expect that to be an issue. Thank you both again for the links and tips!
@SpindleNinja I'm interested in your comment on the C190 as we were thinking about buying one for a remote office. Is that a specific recommendation for that model? Do you have the option of adding spare disks if you want them?
1 should be fine. You can always reassign the spare if needed from the partner node as a temporary measure. You should have RAID-DP or RAID-TEC giving you 2x or 3x parity protection for that reason alone.
@TMADOCTHOMAS there's a lot of catches with the C190 (lower cores, can't add a shelf, etc) but it does have its fit with use cases. One big advantage is lower cost and a spare is a drive you're paying for that you're not using for data storage.
Most folks i'm working with are looking at the 8-12 drive option when it comes to the C190. And with 8 drives... that leaves 7 for data, thinking a shared spare. But like you said in your first post "first failed disk early this morning, which is impressive". The SSDs NetApp uses have really low failure rate. And you have RAID DP on top of it, so you could loose two drives and still be OK.
Thanks @SpindleNinja . If we deployed a C190 we would likely do 12 drives. For years we did 2220s in 7-mode, 12 drives, with a RAID4 aggregate on node 2 and all data on node 1 with a single RAIDDP aggregate. One spare on each side. Gave us 4.1TB usable. We'd likely do something similar with C190 if spares are an option. I'm aware of the other limits but wasn't aware you might not be able to use spares at all.
@paul_stejskal thanks for your comment as well! Yeah it's amazing how notable it was that one of our flash drives failed.