ONTAP Hardware

Fault reported on disk shelf -- temp sensor?

JMCDANIEL89
5,989 Views

Hello,

I've seen a few different posts regarding ID changes and what not but couldn't find anything on temperature sensors.. For the past month (that I'm aware of) I've been getting constant errors throughout the day. These are FAS3140's and all versions and error messages can be seen below. The only error that shows up is the temperature as seen at the bottom where the ambient temperature is unavailable. Support recommended power cycling the shelf itself and I was curious if anyone has had the same issue or if there was an alternative to fixing this.

When doing a power cycle on the shelf is there any impact on volumes that utilize disks on the shelf? Since it's not an issue to production at the moment and more of an annoyance an outage won't be approved to power cycle for just fixing a sensor -- which the temperature is roughly the same as below.

DS14-Mk4-FC

Shelf 4: ESH4  Firmware rev. ESH A: 14  ESH B: 14

OnTap ver - 7.3.2

Channel: 1a

Shelf: 4

SES device path: local access: 1a.64

Module type: ESH4; monitoring is active

Shelf status: unrecoverable condition

Fault reported on disk storage shelf attached to channel 1a. Please check fans, power supplies, disks, and temperature sensors.

Fault previously reported on disk storage shelf attached to channel 1a has been corrected.

  [1] Unavailable (ambient)

  [2] 31 C (87 F)  Normal temperature range

  [3] 31 C (87 F)  Normal temperature range

1 ACCEPTED SOLUTION

billshaffer
5,989 Views

Power cycling the shelf will bring down the disks, so unless you're set up so that your raid groups don't have more than two drives (assuming raidDP) on this shelf, you'll have to schedule cluster downtime.  Even if you ARE set up to survive shelf failures, I'd still set up downtime - you don't really want to introduce double-drive failures if you can avoid it...

Bill

View solution in original post

2 REPLIES 2

billshaffer
5,990 Views

Power cycling the shelf will bring down the disks, so unless you're set up so that your raid groups don't have more than two drives (assuming raidDP) on this shelf, you'll have to schedule cluster downtime.  Even if you ARE set up to survive shelf failures, I'd still set up downtime - you don't really want to introduce double-drive failures if you can avoid it...

Bill

ismopuuronen
5,989 Views

Hi,

Temperature sensor [1] in your case, is located in the shelf, [2] and [3] are in ESH modules.

I would ignore those messages, because 2 and 3 are working properly.

If the air condition in a server room would go off and temperature up, I belive some other sensors; chassis, other shelfs, or these [2] and [3] would notice that as well.

To do a power cycle for the shelf would need a maintenance break (Best practise, halt the filer first, and then shelfs).

I wouldn't do that for just power cycling, I would try to get a replacement shelf from netapp, and replace it during maintenace break.

Not only do the note, that power cycle didn't help, let's find out when we can do this again, and lets then replace the shelf.

It's innoing to get those errors to the log, but it doesen't look serious because you still have plenty of sensors to check the temperature.

When doing a power cycle on the shelf is there any impact on volumes that utilize disks on the shelf? Since it's not an issue to production at the moment and more of an annoyance an outage won't be approved to power cycle for just fixing a sensor -- which the temperature is roughly the same as below.

It is not supported to remove shelf when filer is running (by netapp) In this case I think it is same as power cycling it. Filer can panic etc.

Br.

Ismo.

Public