Subscribe

Running "disk maint" on a broken disk?

I just added a used disk to an existing filer and zeroed it (this was fine).  Then added it to an existing aggregate and immediately got a string of:

Tue Jul 20 21:56:08 PDT [netapp0a: raid.tetris.cksum.embed:CRITICAL]:

Invalid checksum entry on Disk /aggr0/plex0/rg0/0c.00.7 Shelf 0 Bay 7

[NETAPP   X286_S15K5146A15 NA01] S/N [3LN0GZPC00009736X1W9], block #911993, during write operation.

So I immediately ran "disk fail 0c.00.7".  But now "disk maint" won't work.  Is there any way to run diagnostics on a failed disk?

> disk maint start -t ndst -d 0c.00.7
disk maint: Disk 0c.00.7 is not available
I have seen http://communities.netapp.com/message/15992#15992 which contains interesting hints, but not a solution.

Re: Running "disk maint" on a broken disk?

We don't have a second filer.  We could physically visit the location (it is remote), and remove the drive.  But then how would we get the NetApp to "forget" it has ever seen that serial number?

Re: Running "disk maint" on a broken disk?

I found the answer:

> priv set advanced

> disk unfail xxxxxx

This is a used disk, that when added to an aggregate immediately gave:

Tue Jul 20 21:53:25 PDT [netapp0a: raid.tetris.cksum.embed:CRITICAL]: Invalid checksum entry on Disk /aggr0/plex0/rg0/0c.00.7 Shelf 0 Bay 7 [NETAPP   X286_S15K5146A15 NA01] S/N [3LN0GZPC00009736X1W9], block #23911193, during write operation.
Tue Jul 20 21:53:25 PDT [netapp0a: raid.tetris.cksum.embed:CRITICAL]: Invalid checksum entry on Disk /aggr0/plex0/rg0/0c.00.7 Shelf 0 Bay 7 [NETAPP   X286_S15K5146A15 NA01] S/N [3LN0GZPC00009736X1W9], block #23911194, during write operation.
Tue Jul 20 21:53:25 PDT [netapp0a: raid.tetris.cksum.embed:CRITICAL]: Invalid checksum entry on Disk /aggr0/plex0/rg0/0c.00.7 Shelf 0 Bay 7 [NETAPP   X286_S15K5146A15 NA01] S/N [3LN0GZPC00009736X1W9], block #23911191, during write operation.
Tue Jul 20 21:53:25 PDT [netapp0a: raid.tetris.cksum.embed:CRITICAL]: Invalid checksum entry on Disk /aggr0/plex0/rg0/0c.00.7 Shelf 0 Bay 7 [NETAPP   X286_S15K5146A15 NA01] S/N [3LN0GZPC00009736X1W9], block #23911192, during write operation.
Tue Jul 20 21:53:25 PDT [netapp0a: raid.rg.readerr.repair.cksum.error:error]: Checksum error on Disk /aggr0/plex0/rg0/0c.00.7 Shelf 0 Bay 7 [NETAPP   X286_S15K5146A15 NA01] S/N [3LN0GZPC00009736X1W9], block #23911193 after parity recalc

But later passed all "disk maint" tests.  Go figure.