ONTAP Hardware

reporting as - fault on disk storage shelf - every hour.

VKALVEMULA
12,917 Views

hi All

i am seeing this error every 1 hr in message logs.

[NAS02:monitor.shelf.fault:CRITICAL]: Fault reported on disk storage shelf attached to channel 3a. Please check fans, power supplies, disks, and temperature sensors.

we replaced the cables, shelf I/O modules but the error still persist.

did any one had the same type of issue, if so how it has been resolved.

we are on 8.1 RC3 ver

thanks in advance

1 ACCEPTED SOLUTION

VKALVEMULA
12,916 Views

replaced with the new SAS cables which were sent from NetApp.

resolved the issue.

View solution in original post

9 REPLIES 9

scottgelb
12,917 Views

Definitely open a case on this... but in my experience it often requires a power cycle of the shelf (not always).

Get a support recommendation...but typically what we do when we see this...

First, check "environment shelf" and see if any errors...

next, replace/reseat cables (which you did already) and reseat/replace I/O modules (which you did)

If none of those fix it and no error in ONTAP... downtime to power cycle the shelf almost always is the fix that sticks.. just had one at a customer and worked with support for all workarounds until having to power cycle.. it doesn't occur often and wasn't urgent so we waited for their next downtime window and performed the quick maintenance to resolve the issue...same thing where every hour they had the error show up.

VKALVEMULA
12,916 Views

i power down the filers, disk shelves and powered up... still i see the same alert again and again..

scottgelb
12,916 Views

Definitely open a case… some other type of failure or part replacement needed possibly… any errors in “environ shelf” output? The system is MPHA?

VKALVEMULA
12,916 Views

i see the below error

on shelf 0:

  SAS connector attached element list: 1, 2, 3, 4; with error: 2

        SAS cable information by element:

          [1] Vendor: Molex Inc.

              Type: QSFP copper 2m  ID: 00  Swaps: 0

          [2] Vendor: <N/A>

              Type: <N/A> <N/A>  <N/A>  ID: <N/A>  Swaps: 0

          [3] Vendor: Molex Inc.

              Type: QSFP copper 2m  ID: 00  Swaps: 0

          [4] Vendor: Molex Inc.

              Type: QSFP copper 0.5m  ID: 01  Swaps: 0

on shelf 1

   SAS connector attached element list: 1, 2, 3, 4; with error: 1

        SAS cable information by element:

          [1] Vendor: <N/A>

              Type: <N/A> <N/A>  <N/A>  ID: <N/A>  Swaps: 0

          [2] Vendor:

              Type: <N/A> optical 0m  ID: 00  Swaps: 0

          [3] Vendor: Molex Inc.

              Type: QSFP copper 0.5m  ID: 00  Swaps: 0

          [4] Vendor: Molex Inc.

              Type: QSFP copper 5m  ID: 01  Swaps: 1

scottgelb
12,916 Views

Did support get back to you on this? Did you replace the SAS HBA too?

VKALVEMULA
12,916 Views

support replaced the SAS cables..

i requested them to do the health check on SAS card also.

scottgelb
12,916 Views

SAS card is the only thing left I can think of…

VKALVEMULA
12,917 Views

replaced with the new SAS cables which were sent from NetApp.

resolved the issue.

SRITENNETI
12,916 Views

Hi .......

we received this errors :

Sun Jan  5 02:00:00 CST [USTO-PFSX-X01:monitor.shelf.fault:CRITICAL]: Fault reported on disk storage shelf attached to channel 6a. Please check fans, power supplies, disks, and temperature sensors.

This is a non-disruptive steps that can be taken are as follows:

    Replace the module and cable (already on site), confirm path redundancy and monitor for fault message to reoccur

    If the fault message reoccurs , then proceed with the  NDR shelf power cycle.

    Filer USTO-PFSX-X01 > storage power_cycle shelf start -c 6a -s 3

Just to be perfectly upfront, these non-disruptive actions may not resolve the fault message.

But this plan is worth executing at this point before considering replacing a shelf.

Public