Scrub found media errors

alessice · ‎2019-07-18

Hi,

I'm running a FAS2554 (20x4TB+4xSSD) with cDOT 8.3.2P12 that was for about 2 years a SnapMirror destination and now, after moving in a new datacenter, SnapMirror was Break and volume converted to RW for serving data via NFS to clients.

After this conversion the nightly Scrub found some errors, only in the "media", raid, parity and checksum report 0 error:

raid.rg.scrub.summary.media: Scrub found 9 media errors in /sata_data_1/plex0/rg0, 0 in current scrub.

also two disk have reported some bad sectors:

disk.ioRecoveredError.reassign: Recovered error on disk 0b.00.5: op 0x88:00000001c92032c0:000000f0 sector 7669298019 SCSI:recovered error - Disk automatically reassigned data (1 17 6 5) Disk 0b.00.5 Shelf 0 Bay 5 [NETAPP X477_HMKPX04TA07 NA00] S/N [XXXXXX]

and ONTAP have done a powercycle for it:

scsi.cmd.aborted: Disk device 0b.00.5: Command aborted: cdb 0x28:032ff980:0008 (1848).

sas.adapter.debug: adapterName="0a", debug_string="Starting powercycle on device 0b.00.5"

sas.adapter.debug: adapterName="0a", debug_string="WRONG destination on OPEN (0x17) -- delaying: dev 0a.00.5, cdb 0x88:00000001d0d22d80:00000008 (0/1297618), NDU 0x0"

sas.adapter.debug: adapterName="0a", debug_string="Device 0b.00.5 invalidate debounce - 40"

sas.adapter.debug: adapterName="0a", debug_string="Powercycle on device 0b.00.5 complete: status 0"

sas.adapter.debug: adapterName="0a", debug_string="Device 0a.00.5 came back."
sas.adapter.debug: adapterName="0a", debug_string="Device 0b.00.5 came back."

No issue was reported, only notice and warning in the log events.

Do you think is an ordinary activity? Could be related to SnapMirror destination like when a Volume is in DP now scrub is done?

Thanks

andris · ‎2019-07-19

Being a SnapMirror destination doesn't have any bearing on the disk maintenance/recovery activity, IMO.

The disks are just getting older... the activity you described is considered "normal" care and feeding by ONTAP of the storage subsystem. If the errors start becoming more frequent or you start seeing HDD read/write latencies above the norm, then that's worthy of a closer look and a support case.

BTW, 8.3.x ONTAP has passed End-of-Version Support. You need to upgrade to 9.1, 9.3 or 9.5 to stay in a supported configuration, soon.

alessice · ‎2019-08-17

I have received a NetApp Support Bulletin 1091466 that a new firmware is available for my disks (New HDD firmware for X477_HMKPX04TA07 to prevent system outage) so I open a case to Support for these errors and they said to me that 3 disks on my FAS need to replaced, parts have already shipped.

I'm not happy with the idea of where to replace 3 disks in the same RAID.