VMware Solutions Discussions
VMware Solutions Discussions
Hi,
We get the following messages almost every day. The messages below is from november and we get them every month.
Some of these disks has been manually replaced. None of them has been failed automatically by the system. In june we had a few of these messages and then those disks was autofailed.
Is it normal with so many messages on the same disks? It makes me worried about loosing the whole aggregate.
Should we prefail them and call Netapp for replacements?
This is on two DS4243 and 8% of the disks has been replaced. 16% of the disks show these errors.
The system is a FAS3210 running 8.1.2P4
root@nic02:/var/log/remote/2013/11# bzgrep -i disk */na1-*
01/na1-a.log.bz2:Nov 1 04:55:27 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg0/0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMA06N], block #16542209 #015
01/na1-a.log.bz2:Nov 1 04:55:27 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg0/0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMA06N], block #16542209 #015
02/na1-a.log.bz2:Nov 2 11:57:21 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #43051324 #015
03/na1-b.log.bz2:Nov 3 07:00:01 na1-b [na1-b: raid.scrub.suspended:notice]: Disk scrub suspended. #015
05/na1-a.log.bz2:Nov 5 13:58:34 na1-a [na1-a: raid.tetris.media.err:debug]: Read error on Disk /aggr0/plex0/rg0/0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMA06N], block #25946835 during stripe write #015
05/na1-a.log.bz2:Nov 5 13:58:34 na1-a [na1-a: raid.tetris.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggr0/plex0/rg0/0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMA06N], block #25946836 during stripe write #015
05/na1-a.log.bz2:Nov 5 13:58:35 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg0/0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMA06N], block #25946835 #015
05/na1-a.log.bz2:Nov 5 13:58:35 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg0/0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMA06N], block #25946836 #015
07/na1-a.log.bz2:Nov 7 07:36:52 na1-a [na1-a: raid.tetris.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggr0/plex0/rg0/0a.01.13 Shelf 1 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VM3S8N], block #30608986 during stripe write #015
07/na1-a.log.bz2:Nov 7 07:36:52 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg0/0a.01.13 Shelf 1 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VM3S8N], block #30608986 #015
08/na1-a.log.bz2:Nov 8 16:59:48 na1-a [na1-a: raid.rg.diskcopy.start:notice]: /aggr0/plex0/rg0: starting disk copy from 0a.01.17 to 0b.02.5 #015
08/na1-a.log.bz2:Nov 8 17:00:00 na1-a [na1-a: monitor.globalStatus.nonCritical:warning]: There are not enough spare disks. #015
08/na1-a.log.bz2:Nov 8 18:29:30 na1-a [na1-a: raid.rg.diskcopy.done:notice]: /aggr0/plex0/rg0: disk copy from 0a.01.17 to 0b.02.5 completed in 1:29:42.12 #015
08/na1-a.log.bz2:Nov 8 18:29:30 na1-a [na1-a: raid.config.filesystem.disk.admin.failed.after.copy:info]: File system Disk 0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMA06N] is being failed after it was successfully copied to a replacement. #015
08/na1-a.log.bz2:Nov 8 18:29:30 na1-a [na1-a: callhome.fdsk.admin:info]: Call home for FILESYSTEM DISK ADMIN FAILED #015
08/na1-a.log.bz2:Nov 8 18:29:30 na1-a [na1-a: disk.failmsg:error]: Disk 0a.01.17 (J1VMA06N): by operator. #015
08/na1-a.log.bz2:Nov 8 18:29:30 na1-a [na1-a: raid.disk.unload.done:info]: Unload of Disk 0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMA06N] has completed successfully #015
08/na1-a.log.bz2:Nov 8 19:03:22 na1-a [na1-a: diskown.changingOwner:info]: changing ownership for disk 0a.01.17 (S/N CWVJMPNN) from unowned (ID 4294967295) to na1-a (ID 1574564151) #015
08/na1-a.log.bz2:Nov 8 19:04:50 na1-a [na1-a: raid.disk.offline:notice]: Marking Disk 0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA01] S/N [CWVJMPNN] offline. #015
08/na1-a.log.bz2:Nov 8 19:04:50 na1-a [na1-a: bdfu.selected:info]: Disk 0a.01.17 [NETAPP X411_HVIPC420A15 NA01] S/N [CWVJMPNN] selected for background disk firmware update. #015
08/na1-a.log.bz2:Nov 8 19:04:51 na1-a [na1-a: dfu.firmwareDownloading:info]: Now downloading firmware file /etc/disk_fw/X411_HVIPC420A15.NA02.LOD on 1 disk(s) of plex [Pool0]... #015
08/na1-a.log.bz2:Nov 8 19:05:07 na1-a [na1-a: monitor.globalStatus.nonCritical:warning]: There are not enough spare disks. #015
08/na1-a.log.bz2:Nov 8 19:05:07 na1-a [na1-a: raid.disk.online:notice]: Onlining Disk 0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [CWVJMPNN]. #015
08/na1-b.log.bz2:Nov 8 18:29:43 na1-b [na1-b: diskown.errorReadingOwnership:warning]: error 3 (disk failed) while reading ownership on disk 0a.01.17 (S/N J1VMA06N) #015
08/na1-b.log.bz2:Nov 8 19:00:00 na1-b [na1-b: callhome.dsk.redun.fault:error]: Call home for DISK REDUNDANCY FAILED #015
08/na1-b.log.bz2:Nov 8 19:00:26 na1-b [na1-b: raid.disk.missing:info]: Disk 0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMA06N] is missing from the system #015
09/na1-a.log.bz2:Nov 9 09:23:13 na1-a [na1-a: raid.rg.diskcopy.start:notice]: /aggr0/plex0/rg0: starting disk copy from 0a.01.13 to 0a.01.17 #015
09/na1-a.log.bz2:Nov 9 09:24:00 na1-a [na1-a: monitor.globalStatus.nonCritical:warning]: There are not enough spare disks. #015
09/na1-a.log.bz2:Nov 9 09:39:22 na1-a [na1-a: raid.rg.diskcopy.read.err:debug]: Read error on Disk /aggr0/plex0/rg0/0a.01.13 Shelf 1 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VM3S8N], block #17732096 during disk copy #015
09/na1-a.log.bz2:Nov 9 11:00:56 na1-a [na1-a: raid.rg.diskcopy.done:notice]: /aggr0/plex0/rg0: disk copy from 0a.01.13 to 0a.01.17 completed in 1:37:43.35 #015
09/na1-a.log.bz2:Nov 9 11:00:56 na1-a [na1-a: raid.config.filesystem.disk.admin.failed.after.copy:info]: File system Disk 0a.01.13 Shelf 1 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VM3S8N] is being failed after it was successfully copied to a replacement. #015
09/na1-a.log.bz2:Nov 9 11:00:56 na1-a [na1-a: callhome.fdsk.admin:info]: Call home for FILESYSTEM DISK ADMIN FAILED #015
09/na1-a.log.bz2:Nov 9 11:00:56 na1-a [na1-a: disk.failmsg:error]: Disk 0a.01.13 (J1VM3S8N): by operator. #015
09/na1-a.log.bz2:Nov 9 11:00:56 na1-a [na1-a: raid.disk.unload.done:info]: Unload of Disk 0a.01.13 Shelf 1 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VM3S8N] has completed successfully #015
09/na1-a.log.bz2:Nov 9 11:56:48 na1-a [na1-a: diskown.changingOwner:info]: changing ownership for disk 0a.01.13 (S/N CWVK8TMN) from unowned (ID 4294967295) to na1-a (ID 1574564151) #015
09/na1-a.log.bz2:Nov 9 11:58:20 na1-a [na1-a: raid.disk.offline:notice]: Marking Disk 0a.01.13 Shelf 1 Bay 13 [NETAPP X411_HVIPC420A15 NA01] S/N [CWVK8TMN] offline. #015
09/na1-a.log.bz2:Nov 9 11:58:20 na1-a [na1-a: bdfu.selected:info]: Disk 0a.01.13 [NETAPP X411_HVIPC420A15 NA01] S/N [CWVK8TMN] selected for background disk firmware update. #015
09/na1-a.log.bz2:Nov 9 11:58:20 na1-a [na1-a: dfu.firmwareDownloading:info]: Now downloading firmware file /etc/disk_fw/X411_HVIPC420A15.NA02.LOD on 1 disk(s) of plex [Pool0]... #015
09/na1-a.log.bz2:Nov 9 11:58:36 na1-a [na1-a: raid.disk.online:notice]: Onlining Disk 0a.01.13 Shelf 1 Bay 13 [NETAPP X411_HVIPC420A15 NA01] S/N [CWVK8TMN]. #015
10/na1-a.log.bz2:Nov 10 07:00:03 na1-a [na1-a: raid.scrub.suspended:notice]: Disk scrub suspended. #015
11/na1-a.log.bz2:Nov 11 01:06:23 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41127611 #015
11/na1-a.log.bz2:Nov 11 01:06:24 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41127611 #015
12/na1-a.log.bz2:Nov 12 13:04:14 na1-a [na1-a: callhome.invoke.all:info]: User triggered complete call home for USER_TRIGGERED (COMPLETE:2004668135 (two disks have already been changed)) #015
14/na1-a.log.bz2:Nov 14 04:30:11 na1-a [na1-a: raid.rg.scrub.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41557231, during scrub. #015
14/na1-a.log.bz2:Nov 14 04:30:42 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41557231 #015
15/na1-a.log.bz2:Nov 15 04:52:58 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.01.16 Shelf 1 Bay 16 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VM3MKN], block #3779786 #015
15/na1-a.log.bz2:Nov 15 04:52:58 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.01.16 Shelf 1 Bay 16 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VM3MKN], block #3779786 #015
15/na1-a.log.bz2:Nov 15 22:03:24 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41579904 #015
15/na1-a.log.bz2:Nov 15 22:03:24 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41579904 #015
16/na1-a.log.bz2:Nov 16 03:00:04 na1-a [na1-a: raid.tetris.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #34632270 during stripe write #015
16/na1-a.log.bz2:Nov 16 03:00:06 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #34632270 #015
21/na1-a.log.bz2:Nov 21 02:00:59 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #45155745 #015
21/na1-a.log.bz2:Nov 21 02:00:59 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #45155745 #015
21/na1-a.log.bz2:Nov 21 02:30:34 na1-a [na1-a: raid.tetris.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #53136257 during stripe write #015
21/na1-a.log.bz2:Nov 21 02:30:34 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #53136257 #015
21/na1-a.log.bz2:Nov 21 17:22:36 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #32003520 #015
21/na1-a.log.bz2:Nov 21 17:22:36 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #32003520 #015
24/na1-a.log:Nov 24 07:00:02 na1-a [na1-a: raid.scrub.suspended:notice]: Disk scrub suspended. #015
24/na1-b.log:Nov 24 07:00:01 na1-b [na1-b: raid.scrub.suspended:notice]: Disk scrub suspended. #015
That is not normal, and I would be concerned as well. Have any of the drives you've replaced started showing errors again? Is it possible something's happened to the system that might have degraded the drives? The only time I've seen something similar is when our datacenter cooling went out, and we didn't have any temperature sensors. We got it fixed, but 2-3 months later we were losing a couple drives a week.
I would work with NetApp to see if they will replace the erroring drives.
Bill
The system has been in operation for 18+ months or more and until this summer no drives has been replaced. The DC where the system is located has been stable and hasn't had an outage or temperature variation since we started using their services so Im at a loss the explain why the drives suddenly are starting to fail. We do have another controller (same FAS3210) with the same kind of shelf DS4243 and the same type of disks which hasen't shown any problems. Its only the controller that runs our VMware farm that has these errors. The VMware controller probably has more load on it than the CIFS controller so maybe this is related to the load?
According to NetApp they wont replace the drives unless they have failed. We had one drive that failed this summer and it took almost 24 hours until the rebuild finished due to timeouts which caused our customers some pain and thats why we now like to prefail the drives. However since yesterday NetApp wont let us do that anymore and hasn't come up with an plausible explanation why this happens.
-J
Have any of the drives that have been replaced started showing errors? If so, then I would be more likely to suspect something environmental. Higher load is possible, but I wouldn't really expect it. I'm assuming the drive firmware is up to date and matches the other controller? Maybe it's worth getting the serials of the failing drives and see if you can research whether there was a bad production run or something?
NetApp's response seems typical from first or second level. You might try working with your SE to try to escalate the issue from the inside. At one point we had a NetApp person dedicated to fielding customer concerns just like this one - can't remember what his title was, but you probably have one as well. Point out that you've got an abnormal failure rate that is hurting your customers, and therefore you. Might not hurt to hint that there are other NAS vendors out there....
Bill
None of the replaced drives has started showing errors yet.
I have a hard time imaging it could be something environmental since its in a monitored DC which hasn't had any problems yet.
All the drives were bought at the same time so they could possibly be all from the same batch. I'll try and see if I can find the serials of those that failed and see if they all are close.
And I'll give Netapp another call and hear what they have to say.
The frequency of the errors also seem to go up. This is from september until today for one of the bad disks which hasen't failed yet.
09/08/na1-a.log.bz2:Sep 8 02:03:39 na1-a [na1-a: raid.rg.scrub.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #12296245 during scrub #015
09/08/na1-a.log.bz2:Sep 8 02:04:09 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #12296245 #015
09/25/na1-a.log.bz2:Sep 25 05:54:22 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #26139469 #015
09/25/na1-a.log.bz2:Sep 25 05:54:22 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #26139469 #015
09/30/na1-a.log.bz2:Sep 30 12:46:59 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #42008145 #015
09/30/na1-a.log.bz2:Sep 30 12:47:34 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #42008146 #015
10/07/na1-a.log.bz2:Oct 7 16:51:41 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #19161536 #015
10/07/na1-a.log.bz2:Oct 7 16:51:41 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #19161536 #015
11/02/na1-a.log.bz2:Nov 2 11:57:21 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #43051324 #015
11/11/na1-a.log.bz2:Nov 11 01:06:23 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41127611 #015
11/11/na1-a.log.bz2:Nov 11 01:06:24 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41127611 #015
11/14/na1-a.log.bz2:Nov 14 04:30:11 na1-a [na1-a: raid.rg.scrub.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41557231, during scrub. #015
11/14/na1-a.log.bz2:Nov 14 04:30:42 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41557231 #015
11/15/na1-a.log.bz2:Nov 15 22:03:24 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41579904 #015
11/15/na1-a.log.bz2:Nov 15 22:03:24 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41579904 #015
11/16/na1-a.log.bz2:Nov 16 03:00:04 na1-a [na1-a: raid.tetris.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #34632270 during stripe write #015
11/16/na1-a.log.bz2:Nov 16 03:00:06 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #34632270 #015
11/21/na1-a.log.bz2:Nov 21 02:00:59 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #45155745 #015
11/21/na1-a.log.bz2:Nov 21 02:00:59 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #45155745 #015
11/21/na1-a.log.bz2:Nov 21 02:30:34 na1-a [na1-a: raid.tetris.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #53136257 during stripe write #015
11/21/na1-a.log.bz2:Nov 21 02:30:34 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #53136257 #015
11/21/na1-a.log.bz2:Nov 21 17:22:36 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #32003520 #015
11/21/na1-a.log.bz2:Nov 21 17:22:36 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #32003520 #015
11/26/na1-a.log.bz2:Nov 26 09:04:57 na1-a [na1-a: raid.tetris.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #42176960 during stripe write #015
11/26/na1-a.log.bz2:Nov 26 09:04:57 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #42176960 #015
11/27/na1-a.log.bz2:Nov 27 01:36:52 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #26939584 #015
11/27/na1-a.log.bz2:Nov 27 01:36:52 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #26939584 #015
11/29/na1-a.log.bz2:Nov 29 09:16:15 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #52566345 #015
11/29/na1-a.log.bz2:Nov 29 09:16:15 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #52566345 #015
11/29/na1-a.log.bz2:Nov 29 12:03:44 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #7056768 #015
11/29/na1-a.log.bz2:Nov 29 12:03:44 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #7056768 #015
11/30/na1-a.log:Nov 30 08:17:27 na1-a [na1-a: raid.tetris.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41577728 during stripe write #015
11/30/na1-a.log:Nov 30 08:17:28 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41577728 #015
12/01/na1-a.log:Dec 1 00:10:22 na1-a [na1-a: raid.tetris.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #49728270 during stripe write #015
12/01/na1-a.log:Dec 1 00:10:22 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #49728270 #015
12/01/na1-a.log:Dec 1 02:03:18 na1-a [na1-a: raid.rg.scrub.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #14779074 during scrub #015
12/01/na1-a.log:Dec 1 02:03:48 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #14779074 #015
12/02/na1-a.log:Dec 2 01:38:23 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #80977792 #015
12/02/na1-a.log:Dec 2 01:38:24 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #80977792 #015
-J
Yeah, something is fishy with the drives, I bet. Good luck...
Bill
Still getting these errors. Netapp wont replace the drives prematurely and about two weeks later the disk finally dies and then autosupport triggers a case.
This weeks "not an error message" message:
---< cut >---
06/na1-a.log.bz2:Jan 6 05:27:48 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN], block #81237916 #015
06/na1-a.log.bz2:Jan 6 05:27:48 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN], block #81237916 #015
06/na1-a.log.bz2:Jan 6 05:31:26 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN], block #81237935 #015
06/na1-a.log.bz2:Jan 6 05:31:26 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN], block #81237936 #015
06/na1-a.log.bz2:Jan 6 05:31:26 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN], block #81237937 #015
06/na1-a.log.bz2:Jan 6 05:31:27 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN], block #81237935 #015
06/na1-a.log.bz2:Jan 6 05:31:27 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN], block #81237936 #015
06/na1-a.log.bz2:Jan 6 05:31:27 na1-a [na1-a: raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN], block #81237937 #015
06/na1-a.log.bz2:Jan 6 05:31:38 na1-a [na1-a: raid.disk.offline:notice]: Marking Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN] offline. #015
06/na1-a.log.bz2:Jan 6 05:31:56 na1-a [na1-a: raid.disk.online:notice]: Onlining Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN]. #015
06/na1-a.log.bz2:Jan 6 05:31:57 na1-a [na1-a: raid.rg.recons.start:debug]: /aggr0/plex0/rg1: starting reconstruction, using disk 0b.02.13 #015
10/na1-a.log:Jan 10 02:00:49 na1-a [na1-a: raid.disk.offline:notice]: Marking Disk /aggr0/plex0/rg1/0a.02.4 Shelf 2 Bay 4 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMBZ9N] offline. #015
10/na1-a.log:Jan 10 02:01:05 na1-a [na1-a: raid.disk.online:notice]: Onlining Disk /aggr0/plex0/rg1/0a.02.4 Shelf 2 Bay 4 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMBZ9N]. #015
10/na1-a.log:Jan 10 02:01:05 na1-a [na1-a: raid.rg.recons.start:debug]: /aggr0/plex0/rg1: starting reconstruction, using disk 0a.02.4 #015
---< cut >---
Have you tried escalating through your sales rep or local support team?