You can't replace a failed disk in a raid group with a smaller disk. It appears you have expanded the degraded raid group with a smaller capacity drive, which is not what you intended. You should probably open a case at this point to review your options.
If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.
because you added this disk to the aggregate, you now cannot remove it (you can replace to bigger disk after you restore your raid functionality).
if you still have an empty slot, that's the time to add correct size disk and let the system do the rebuild work.
if you didn’t have spare disk/slot from the first place, i will suggest to change your aggregate raid level to 4, that will free up one of the parity disks, and it can then be used as data disk and will allow you to restore the aggregate performance. This suggestion will put you in a very degraded situation, so it’s on your own risk of course….
Thank you for your answers and solutions in detials!
May I please ask additional questions (confirmation) about answer 2 and 3?
ANSWER 2: Does it mean now I cannot remove it, but if I have an empty slot and add a new 1TB hard drive to the emply slot, I will be able to have a chance to replace it to the 1TB disk? As a result the 500GB disk will be able to remove from this aggregate? If yes, could you give me a hint how can I do that?
ANSWER 3: Actually I still have a empty slot (0c.00.10), so I am planning to add a 1TB hard drive to the empty, by doing following procedure, will the rebuild be started automatically?
1)Insert new 1TB hard drive to the empty slot (0c.00.10) 2)Then the hard drive will be assigned to a spare disk. I run "disk zero spares" to zero it. 3)After zeroing finish, I run "aggr add bigaggr -d 0c.00.10" to add the spare disk to this aggr.
Then rebuild will be started. Is my thinking and procedure correct?
Sorry for asking so long and so beginner questions. If you can give me a reply, it will be really big help.
Thank you for giving me so helpful and detailed explanation! The procedure that you gave me is really a good leaning for me. Thank you very much!
Following your advice, I'm going to add a new 1TB disk and make it to spare (I still have a empty slot in bay 0c.00.10, so I will use this slot). Then as you mentioned, I will wait until rebuild is completed.
At this point I just have only two question:
1) After the rebuild is completed, will following line in aggr status disappear? And can I see the remaining progress status of rebuild when it is rebuilding?
data FAILED N/A 847555/1735794176
2) I know that 500GB disk in this aggregate is not a suitable size. But is it functioning for this aggregate? You metioned I should run disk replace command to move the 500GB disk to new 1TB disk, but I don't have empty slot anymore after I add a new 1TB disk to bay 0c.00.10. So can I just keep the 500GB disk in this aggregate?
Anyway at this point I don't want to have any risk, so I will focus on how to let the aggregate keep its health.
You are really giving me big help! Thank you very much!
Yes as you said, actually the the empty slot that I mentioned is inserted a failed hard drive. I think the "data FAILED" in aggregate "bigaggr" is it.
I knew it is broken just because the LED of it turned amber, but I can't see any information of it by "sysconfig -r" command. That's why I said it's a empty slot. As you mentioned, strickly speaking I should said it is failed disk.
Following is the output result of "sysconfig -r" now.
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) --------- ------ ------------- ---- ---- ---- ----- -------------- -------------- dparity 0c.00.11 0c 0 11 SA:A 0 SATA 7200 847555/1735794176 847884/1736466816 parity 0c.00.9 0c 0 9 SA:A 0 SATA 7200 847555/1735794176 847884/1736466816 data 0c.00.6 0c 0 6 SA:A 0 SATA 7200 847555/1735794176 847884/1736466816 data 0c.00.7 0c 0 7 SA:A 0 SATA 7200 847555/1735794176 847884/1736466816 data FAILED N/A 847555/1735794176 data 0c.00.5 0c 0 5 SA:A 0 SATA 7200 847555/1735794176 847884/1736466816 data 0c.00.0 0c 0 0 SA:A 0 SATA 7200 847555/1735794176 847884/1736466816 data 0c.00.3 0c 0 3 SA:A 0 SATA 7200 423111/866531584 423946/868242816
Pool1 spare disks (empty)
Pool0 spare disks
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) --------- ------ ------------- ---- ---- ---- ----- -------------- -------------- Spare disks for block or zoned checksum traditional volumes or aggregates spare 0c.00.8 0c 0 8 SA:A 0 SATA 7200 423111/866531584 423946/868242816 ______________________________________________________________________
As you adviced, the next step I'm going to replace the failed hard drive ("empty slot" 0c.00.10) by a 1TB new hard drive, and set the new 1TB hard drive to a spare disk.
After that, I will look the rebuild started automatically, and confirm that the "data FAILED" disappear from the aggregate (bigaggr). I think keep the aggregate healthy is the most important mission for me now...
I'm really a beginner of NetApp, so if my plan has any problems, if you can notice that to me, it really will be a big help!!
Thank you for your kindly explanation! Very helpful!
You have one spare disk. Spare can be used for smaller disks as well. So what I said before still holds:
physically replaces 500GB spare with 1TB spare
run "disk replace" to swap 500GB disk in bigaggr with 1TB spare. This will leave you with 500GB spare again
physically replace 500GB with one more 1TB disk
This will leave you with one 1TB spare suitable for both aggr0 and bigaggr.
If you have possibility to do so, I'd consider moving content of aggr0 to bigaggr, replacing 500GB disks with 1TB disks and adding them to bigaggr. This would improve space utilization and resiliency and may give slight performance boost.