FAS and V-Series Storage Systems Discussions

Multiple "Block recommended for reassignment on Disk"

tomasz_golebiewski

Hello,

 

since U'r support is closed and I cannot even open new low priority case with NetApp I wanna ask is it ok to have multiple errors:

Sun Feb 16 19:22:15 CET [filer: raid.rg.scrub.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggr1/plex0/rg0/0d.01.2 Shelf 1 Bay 2 [NETAPP   X410_HVIPC288A15 NA02] S/N [XXXXXXXX], block #2280307 during scrub

I belive ONTAP should predict failure and mark the drive offline, but it still tries to save the drive every week during scrub.

 

drive: X410_HVIPC288A15, fw: NA02

OS: ONTAP 7.3.6

inside shelf DS4243: IOM3, fw: 0172

 

When I send manual autosupport, in section "Disk defect list for disks that have reported errors" it says "Disk 0d.01.2 grown defect list has 396 entries" so we're aware of data consistency..

In one scrub there are 4 up to 39 media errors reported on single drive.

6 REPLIES 6

scottgelb

You can manually fail the drive or disk replace the drive.  Also from priv set advanced check disk shm_stats for the storage health monitor counters.  Or all into support for the recommendation (probably disk replace and rma of the drive).

Yes, we know about manually setting drive as failed.

However, this should be done by ONTAP anyway.

Are these values correct? There are pretty .. unbelievable.

Sometimes we have to manually fail disks but agree it should catch these when errors can't be corrected. It may be a threshold not caught in this ontap release but fixed in a newer release. Definitely worth calling in a case to see if a higher release will fail this drive or why it isn't being failed.

Sent from my iPhone 5

There is nothing new related to media errors detection in ONTAP version 7.3.7.

https://library.netapp.com/ecm/ecm_download_file/ECMP1134331

But there is a little information about failure thresholds:

https://library.netapp.com/ecmdocs/ECMM1278110/html/mgmtsag/4raid22.htm

however, it just says:

More than twenty-five media errors (that are not related to disk scrub activity) occurring on a disk within a ten-minute period

and THESE ARE media errors during scrub activity.

Agreed...based on the counters it is a candidate for failure.  Definitely worth having support look into it and could be an existing BURT fixed in a newer release of ONTAP or drive firmware.

The drive has failed.

Announcements
NetApp on Discord Image

We're on Discord, are you?

Live Chat, Watch Parties, and More!

Explore Banner

Meet Explore, NetApp’s digital sales platform

Engage digitally throughout the sales process, from product discovery to configuration, and handle all your post-purchase needs.

NetApp Insights to Action
I2A Banner
Public