Re: Read error on Disk and lots of them - Is it normal?

JANAKE_RONNBLOM · ‎2013-11-25

Hi,

We get the following messages almost every day. The messages below is from november and we get them every month.

Some of these disks has been manually replaced. None of them has been failed automatically by the system. In june we had a few of these messages and then those disks was autofailed.

Is it normal with so many messages on the same disks? It makes me worried about loosing the whole aggregate.

Should we prefail them and call Netapp for replacements?

This is on two DS4243 and 8% of the disks has been replaced. 16% of the disks show these errors.

The system is a FAS3210 running 8.1.2P4

root@nic02:/var/log/remote/2013/11# bzgrep -i disk */na1-*

01/na1-a.log.bz2:Nov 1 04:55:27 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg0/0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMA06N], block #16542209 #015

01/na1-a.log.bz2:Nov 1 04:55:27 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg0/0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMA06N], block #16542209 #015

02/na1-a.log.bz2:Nov 2 11:57:21 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #43051324 #015

03/na1-b.log.bz2:Nov 3 07:00:01 na1-b [na1-b: raid.scrub.suspended:notice]: Disk scrub suspended. #015

05/na1-a.log.bz2:Nov 5 13:58:34 na1-a [na1-a: raid.tetris.media.err:debug]: Read error on Disk /aggr0/plex0/rg0/0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMA06N], block #25946835 during stripe write #015

05/na1-a.log.bz2:Nov 5 13:58:34 na1-a [na1-a: raid.tetris.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggr0/plex0/rg0/0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMA06N], block #25946836 during stripe write #015

05/na1-a.log.bz2:Nov 5 13:58:35 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg0/0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMA06N], block #25946835 #015

05/na1-a.log.bz2:Nov 5 13:58:35 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg0/0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMA06N], block #25946836 #015

07/na1-a.log.bz2:Nov 7 07:36:52 na1-a [na1-a: raid.tetris.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggr0/plex0/rg0/0a.01.13 Shelf 1 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VM3S8N], block #30608986 during stripe write #015

07/na1-a.log.bz2:Nov 7 07:36:52 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg0/0a.01.13 Shelf 1 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VM3S8N], block #30608986 #015

08/na1-a.log.bz2:Nov 8 16:59:48 na1-a [na1-a: raid.rg.diskcopy.start:notice]: /aggr0/plex0/rg0: starting disk copy from 0a.01.17 to 0b.02.5 #015

08/na1-a.log.bz2:Nov 8 17:00:00 na1-a [na1-a: monitor.globalStatus.nonCritical:warning]: There are not enough spare disks. #015

08/na1-a.log.bz2:Nov 8 18:29:30 na1-a [na1-a: raid.rg.diskcopy.done:notice]: /aggr0/plex0/rg0: disk copy from 0a.01.17 to 0b.02.5 completed in 1:29:42.12 #015

08/na1-a.log.bz2:Nov 8 18:29:30 na1-a [na1-a: raid.config.filesystem.disk.admin.failed.after.copy:info]: File system Disk 0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMA06N] is being failed after it was successfully copied to a replacement. #015

08/na1-a.log.bz2:Nov 8 18:29:30 na1-a [na1-a: callhome.fdsk.admin:info]: Call home for FILESYSTEM DISK ADMIN FAILED #015

08/na1-a.log.bz2:Nov 8 18:29:30 na1-a [na1-a: disk.failmsg:error]: Disk 0a.01.17 (J1VMA06N): by operator. #015

08/na1-a.log.bz2:Nov 8 18:29:30 na1-a [na1-a: raid.disk.unload.done:info]: Unload of Disk 0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMA06N] has completed successfully #015

08/na1-a.log.bz2:Nov 8 19:03:22 na1-a [na1-a: diskown.changingOwner:info]: changing ownership for disk 0a.01.17 (S/N CWVJMPNN) from unowned (ID 4294967295) to na1-a (ID 1574564151) #015

08/na1-a.log.bz2:Nov 8 19:04:50 na1-a [na1-a: raid.disk.offline:notice]: Marking Disk 0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA01] S/N [CWVJMPNN] offline. #015

08/na1-a.log.bz2:Nov 8 19:04:50 na1-a [na1-a: bdfu.selected:info]: Disk 0a.01.17 [NETAPP X411_HVIPC420A15 NA01] S/N [CWVJMPNN] selected for background disk firmware update. #015

08/na1-a.log.bz2:Nov 8 19:04:51 na1-a [na1-a: dfu.firmwareDownloading:info]: Now downloading firmware file /etc/disk_fw/X411_HVIPC420A15.NA02.LOD on 1 disk(s) of plex [Pool0]... #015

08/na1-a.log.bz2:Nov 8 19:05:07 na1-a [na1-a: monitor.globalStatus.nonCritical:warning]: There are not enough spare disks. #015

08/na1-a.log.bz2:Nov 8 19:05:07 na1-a [na1-a: raid.disk.online:notice]: Onlining Disk 0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [CWVJMPNN]. #015

08/na1-b.log.bz2:Nov 8 18:29:43 na1-b [na1-b: diskown.errorReadingOwnership:warning]: error 3 (disk failed) while reading ownership on disk 0a.01.17 (S/N J1VMA06N) #015

08/na1-b.log.bz2:Nov 8 19:00:00 na1-b [na1-b: callhome.dsk.redun.fault:error]: Call home for DISK REDUNDANCY FAILED #015

08/na1-b.log.bz2:Nov 8 19:00:26 na1-b [na1-b: raid.disk.missing:info]: Disk 0a.01.17 Shelf 1 Bay 17 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMA06N] is missing from the system #015

09/na1-a.log.bz2:Nov 9 09:23:13 na1-a [na1-a: raid.rg.diskcopy.start:notice]: /aggr0/plex0/rg0: starting disk copy from 0a.01.13 to 0a.01.17 #015

09/na1-a.log.bz2:Nov 9 09:24:00 na1-a [na1-a: monitor.globalStatus.nonCritical:warning]: There are not enough spare disks. #015

09/na1-a.log.bz2:Nov 9 09:39:22 na1-a [na1-a: raid.rg.diskcopy.read.err:debug]: Read error on Disk /aggr0/plex0/rg0/0a.01.13 Shelf 1 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VM3S8N], block #17732096 during disk copy #015

09/na1-a.log.bz2:Nov 9 11:00:56 na1-a [na1-a: raid.rg.diskcopy.done:notice]: /aggr0/plex0/rg0: disk copy from 0a.01.13 to 0a.01.17 completed in 1:37:43.35 #015

09/na1-a.log.bz2:Nov 9 11:00:56 na1-a [na1-a: raid.config.filesystem.disk.admin.failed.after.copy:info]: File system Disk 0a.01.13 Shelf 1 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VM3S8N] is being failed after it was successfully copied to a replacement. #015

09/na1-a.log.bz2:Nov 9 11:00:56 na1-a [na1-a: callhome.fdsk.admin:info]: Call home for FILESYSTEM DISK ADMIN FAILED #015

09/na1-a.log.bz2:Nov 9 11:00:56 na1-a [na1-a: disk.failmsg:error]: Disk 0a.01.13 (J1VM3S8N): by operator. #015

09/na1-a.log.bz2:Nov 9 11:00:56 na1-a [na1-a: raid.disk.unload.done:info]: Unload of Disk 0a.01.13 Shelf 1 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VM3S8N] has completed successfully #015

09/na1-a.log.bz2:Nov 9 11:56:48 na1-a [na1-a: diskown.changingOwner:info]: changing ownership for disk 0a.01.13 (S/N CWVK8TMN) from unowned (ID 4294967295) to na1-a (ID 1574564151) #015

09/na1-a.log.bz2:Nov 9 11:58:20 na1-a [na1-a: raid.disk.offline:notice]: Marking Disk 0a.01.13 Shelf 1 Bay 13 [NETAPP X411_HVIPC420A15 NA01] S/N [CWVK8TMN] offline. #015

09/na1-a.log.bz2:Nov 9 11:58:20 na1-a [na1-a: bdfu.selected:info]: Disk 0a.01.13 [NETAPP X411_HVIPC420A15 NA01] S/N [CWVK8TMN] selected for background disk firmware update. #015

09/na1-a.log.bz2:Nov 9 11:58:20 na1-a [na1-a: dfu.firmwareDownloading:info]: Now downloading firmware file /etc/disk_fw/X411_HVIPC420A15.NA02.LOD on 1 disk(s) of plex [Pool0]... #015

09/na1-a.log.bz2:Nov 9 11:58:36 na1-a [na1-a: raid.disk.online:notice]: Onlining Disk 0a.01.13 Shelf 1 Bay 13 [NETAPP X411_HVIPC420A15 NA01] S/N [CWVK8TMN]. #015

10/na1-a.log.bz2:Nov 10 07:00:03 na1-a [na1-a: raid.scrub.suspended:notice]: Disk scrub suspended. #015

11/na1-a.log.bz2:Nov 11 01:06:23 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41127611 #015

11/na1-a.log.bz2:Nov 11 01:06:24 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41127611 #015

12/na1-a.log.bz2:Nov 12 13:04:14 na1-a [na1-a: callhome.invoke.all:info]: User triggered complete call home for USER_TRIGGERED (COMPLETE:2004668135 (two disks have already been changed)) #015

14/na1-a.log.bz2:Nov 14 04:30:11 na1-a [na1-a: raid.rg.scrub.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41557231, during scrub. #015

14/na1-a.log.bz2:Nov 14 04:30:42 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41557231 #015

15/na1-a.log.bz2:Nov 15 04:52:58 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.01.16 Shelf 1 Bay 16 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VM3MKN], block #3779786 #015

15/na1-a.log.bz2:Nov 15 04:52:58 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.01.16 Shelf 1 Bay 16 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VM3MKN], block #3779786 #015

15/na1-a.log.bz2:Nov 15 22:03:24 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41579904 #015

15/na1-a.log.bz2:Nov 15 22:03:24 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41579904 #015

16/na1-a.log.bz2:Nov 16 03:00:04 na1-a [na1-a: raid.tetris.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #34632270 during stripe write #015

16/na1-a.log.bz2:Nov 16 03:00:06 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #34632270 #015

21/na1-a.log.bz2:Nov 21 02:00:59 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #45155745 #015

21/na1-a.log.bz2:Nov 21 02:00:59 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #45155745 #015

21/na1-a.log.bz2:Nov 21 02:30:34 na1-a [na1-a: raid.tetris.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #53136257 during stripe write #015

21/na1-a.log.bz2:Nov 21 02:30:34 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #53136257 #015

21/na1-a.log.bz2:Nov 21 17:22:36 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #32003520 #015

21/na1-a.log.bz2:Nov 21 17:22:36 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #32003520 #015

24/na1-a.log:Nov 24 07:00:02 na1-a [na1-a: raid.scrub.suspended:notice]: Disk scrub suspended. #015

24/na1-b.log:Nov 24 07:00:01 na1-b [na1-b: raid.scrub.suspended:notice]: Disk scrub suspended. #015

billshaffer · ‎2013-11-25

That is not normal, and I would be concerned as well. Have any of the drives you've replaced started showing errors again? Is it possible something's happened to the system that might have degraded the drives? The only time I've seen something similar is when our datacenter cooling went out, and we didn't have any temperature sensors. We got it fixed, but 2-3 months later we were losing a couple drives a week.

I would work with NetApp to see if they will replace the erroring drives.

Bill

JANAKE_RONNBLOM · ‎2013-11-26

The system has been in operation for 18+ months or more and until this summer no drives has been replaced. The DC where the system is located has been stable and hasn't had an outage or temperature variation since we started using their services so Im at a loss the explain why the drives suddenly are starting to fail. We do have another controller (same FAS3210) with the same kind of shelf DS4243 and the same type of disks which hasen't shown any problems. Its only the controller that runs our VMware farm that has these errors. The VMware controller probably has more load on it than the CIFS controller so maybe this is related to the load?

According to NetApp they wont replace the drives unless they have failed. We had one drive that failed this summer and it took almost 24 hours until the rebuild finished due to timeouts which caused our customers some pain and thats why we now like to prefail the drives. However since yesterday NetApp wont let us do that anymore and hasn't come up with an plausible explanation why this happens.

-J

billshaffer · ‎2013-11-27

Have any of the drives that have been replaced started showing errors? If so, then I would be more likely to suspect something environmental. Higher load is possible, but I wouldn't really expect it. I'm assuming the drive firmware is up to date and matches the other controller? Maybe it's worth getting the serials of the failing drives and see if you can research whether there was a bad production run or something?

NetApp's response seems typical from first or second level. You might try working with your SE to try to escalate the issue from the inside. At one point we had a NetApp person dedicated to fielding customer concerns just like this one - can't remember what his title was, but you probably have one as well. Point out that you've got an abnormal failure rate that is hurting your customers, and therefore you. Might not hurt to hint that there are other NAS vendors out there....

Bill

JANAKE_RONNBLOM · ‎2013-12-02

None of the replaced drives has started showing errors yet.

I have a hard time imaging it could be something environmental since its in a monitored DC which hasn't had any problems yet.

All the drives were bought at the same time so they could possibly be all from the same batch. I'll try and see if I can find the serials of those that failed and see if they all are close.

And I'll give Netapp another call and hear what they have to say.

The frequency of the errors also seem to go up. This is from september until today for one of the bad disks which hasen't failed yet.

09/08/na1-a.log.bz2:Sep 8 02:03:39 na1-a [na1-a: raid.rg.scrub.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #12296245 during scrub #015

09/08/na1-a.log.bz2:Sep 8 02:04:09 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #12296245 #015

09/25/na1-a.log.bz2:Sep 25 05:54:22 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #26139469 #015

09/25/na1-a.log.bz2:Sep 25 05:54:22 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #26139469 #015

09/30/na1-a.log.bz2:Sep 30 12:46:59 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #42008145 #015

09/30/na1-a.log.bz2:Sep 30 12:47:34 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #42008146 #015

10/07/na1-a.log.bz2:Oct 7 16:51:41 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #19161536 #015

10/07/na1-a.log.bz2:Oct 7 16:51:41 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #19161536 #015

11/02/na1-a.log.bz2:Nov 2 11:57:21 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #43051324 #015

11/11/na1-a.log.bz2:Nov 11 01:06:23 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41127611 #015

11/11/na1-a.log.bz2:Nov 11 01:06:24 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41127611 #015

11/14/na1-a.log.bz2:Nov 14 04:30:11 na1-a [na1-a: raid.rg.scrub.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41557231, during scrub. #015

11/14/na1-a.log.bz2:Nov 14 04:30:42 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41557231 #015

11/15/na1-a.log.bz2:Nov 15 22:03:24 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41579904 #015

11/15/na1-a.log.bz2:Nov 15 22:03:24 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41579904 #015

11/16/na1-a.log.bz2:Nov 16 03:00:04 na1-a [na1-a: raid.tetris.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #34632270 during stripe write #015

11/16/na1-a.log.bz2:Nov 16 03:00:06 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #34632270 #015

11/21/na1-a.log.bz2:Nov 21 02:00:59 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #45155745 #015

11/21/na1-a.log.bz2:Nov 21 02:00:59 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #45155745 #015

11/21/na1-a.log.bz2:Nov 21 02:30:34 na1-a [na1-a: raid.tetris.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #53136257 during stripe write #015

11/21/na1-a.log.bz2:Nov 21 02:30:34 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #53136257 #015

11/21/na1-a.log.bz2:Nov 21 17:22:36 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #32003520 #015

11/21/na1-a.log.bz2:Nov 21 17:22:36 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #32003520 #015

11/26/na1-a.log.bz2:Nov 26 09:04:57 na1-a [na1-a: raid.tetris.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #42176960 during stripe write #015

11/26/na1-a.log.bz2:Nov 26 09:04:57 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #42176960 #015

11/27/na1-a.log.bz2:Nov 27 01:36:52 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #26939584 #015

11/27/na1-a.log.bz2:Nov 27 01:36:52 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #26939584 #015

11/29/na1-a.log.bz2:Nov 29 09:16:15 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #52566345 #015

11/29/na1-a.log.bz2:Nov 29 09:16:15 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #52566345 #015

11/29/na1-a.log.bz2:Nov 29 12:03:44 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #7056768 #015

11/29/na1-a.log.bz2:Nov 29 12:03:44 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #7056768 #015

11/30/na1-a.log:Nov 30 08:17:27 na1-a [na1-a: raid.tetris.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41577728 during stripe write #015

11/30/na1-a.log:Nov 30 08:17:28 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #41577728 #015

12/01/na1-a.log:Dec 1 00:10:22 na1-a [na1-a: raid.tetris.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #49728270 during stripe write #015

12/01/na1-a.log:Dec 1 00:10:22 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #49728270 #015

12/01/na1-a.log:Dec 1 02:03:18 na1-a [na1-a: raid.rg.scrub.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #14779074 during scrub #015

12/01/na1-a.log:Dec 1 02:03:48 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #14779074 #015

12/02/na1-a.log:Dec 2 01:38:23 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #80977792 #015

12/02/na1-a.log:Dec 2 01:38:24 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0a.02.10 Shelf 2 Bay 10 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMVDBN], block #80977792 #015

-J

billshaffer · ‎2013-12-02

Yeah, something is fishy with the drives, I bet. Good luck...

Bill

JANAKE_RONNBLOM · ‎2014-01-09

Still getting these errors. Netapp wont replace the drives prematurely and about two weeks later the disk finally dies and then autosupport triggers a case.

This weeks "not an error message" message:

---< cut >---

06/na1-a.log.bz2:Jan 6 05:27:48 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN], block #81237916 #015

06/na1-a.log.bz2:Jan 6 05:27:48 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN], block #81237916 #015

06/na1-a.log.bz2:Jan 6 05:31:26 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN], block #81237935 #015

06/na1-a.log.bz2:Jan 6 05:31:26 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN], block #81237936 #015

06/na1-a.log.bz2:Jan 6 05:31:26 na1-a [na1-a: raid.read.media.err:debug]: Read error on Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN], block #81237937 #015

06/na1-a.log.bz2:Jan 6 05:31:27 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN], block #81237935 #015

06/na1-a.log.bz2:Jan 6 05:31:27 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN], block #81237936 #015

06/na1-a.log.bz2:Jan 6 05:31:27 na1-a [na1-a: raid.rg.readerr.repair.data&colon;debug]: Fixing bad data on Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN], block #81237937 #015

06/na1-a.log.bz2:Jan 6 05:31:38 na1-a [na1-a: raid.disk.offline:notice]: Marking Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN] offline. #015

06/na1-a.log.bz2:Jan 6 05:31:56 na1-a [na1-a: raid.disk.online:notice]: Onlining Disk /aggr0/plex0/rg1/0b.02.13 Shelf 2 Bay 13 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMWJUN]. #015

06/na1-a.log.bz2:Jan 6 05:31:57 na1-a [na1-a: raid.rg.recons.start:debug]: /aggr0/plex0/rg1: starting reconstruction, using disk 0b.02.13 #015

10/na1-a.log:Jan 10 02:00:49 na1-a [na1-a: raid.disk.offline:notice]: Marking Disk /aggr0/plex0/rg1/0a.02.4 Shelf 2 Bay 4 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMBZ9N] offline. #015

10/na1-a.log:Jan 10 02:01:05 na1-a [na1-a: raid.disk.online:notice]: Onlining Disk /aggr0/plex0/rg1/0a.02.4 Shelf 2 Bay 4 [NETAPP X411_HVIPC420A15 NA02] S/N [J1VMBZ9N]. #015

10/na1-a.log:Jan 10 02:01:05 na1-a [na1-a: raid.rg.recons.start:debug]: /aggr0/plex0/rg1: starting reconstruction, using disk 0a.02.4 #015

---< cut >---

billshaffer · ‎2014-01-10

Have you tried escalating through your sales rep or local support team?

Read error on Disk and lots of them - Is it normal?

Introducing GenAI Search on NSS