Legacy Product Discussions
Legacy Product Discussions
Hi everyone,
Am working for France Telecom and I've been administrating about 6 netapp filers for the last 8 months. Recently the /etc/messages popped up some pesky alarms of this type:
[windata8602:monitor.shelf.fault:CRITICAL]: Fault reported on disk storage shelf attached to channel 0b. Please check fans, power, and temperature.
When I'm grep-ing for "status" after the "environment status shelf" command I'm getting this for some shelves:
Shelf status: information condition
Shelf status: critical condition
However, the output of "environment status chassis all" seems OK:
Temperature ok
PSU 1 ok
PSU 2 ok
Voltage ok
SYS FAN ok
NVRAM6-temperature-3 ok
NVRAM6-battery-3 ok.
Am I fretting for nothing?
Thanks a lot
Dan
Hi Dan,
You can check the Global status via FilerView . If the status is still in Critical status then you check this kb
https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb23174
Thanks
Daniel
The command "environment status chassis" only looks at the storage appliance itself (its onboard power supplies, and fans, etc).
It looks like you have 2 shelves that have some type of issue. A lot of times these errors will cause a case to be generated, but the standard response for those cases is to email the contact and ask them to inspect the shelf or module and make sure the condition is real. If the customer does not see the email, or respond, the case will automatically close. A customer power condition can cause these conditions, so its not apporpriate to always ship a part, until power fluctuations, or maintenance are ruled out.
You need to interrogate all the lines of the "environment status shelf" for those shelves on loop 0b to see where the issue really is. That output is roughly 30 lines long for each shelf. There are several classes of errors that can produce that output. It is important to address issues if you see them.
1.)You may not have redundant power to the shelf (and not having both PSU's running can have implications for how the disks respond in high-load situations)
2.)You may not be cooling effectively.
3.) firmware won't download to a shelf with an error condition.
Here is an example, on which I have highlighted the fields you should watch for:
Channel: 1d
Shelf: 3
SES device path: local access: 1d.49
Module type: ESH4; monitoring is active
Shelf status: normal condition
SES Configuration, via loop id 49 in shelf 3:
logical identifier=0x50050cc0021082c9
vendor identification=XYRATEX
product identification=DS14-Mk4-FC
product revision level=1313
Vendor-specific information:
Product Serial Number: OPS6773421082C9
Optional Settings: 0x00
Status reads attempted: 222163; failed: 0
Control writes attempted: 7415; failed: 0
Shelf bays with disk devices installed:
13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
with error: none
Power Supply installed element list: 1, 2; with error: none
Power Supply information by element:
[1] Serial number: PMC643620239587
Type: <N/A>
Firmware version: <N/A>
[2] Serial number: PMC643620239618
Type: <N/A>
Firmware version: <N/A>
Power control element status: ok
Cooling Unit installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
Shelf temperatures by element:
[1] 26 C (78 F) (ambient) Normal temperature range
[2] 31 C (87 F) Normal temperature range
[3] 34 C (93 F) Normal temperature range
Temperature thresholds by element:
[1] High critical: 50 C (122 F); high warning 40 C (104 F)
Low critical: 0C (32 F); low warning 10 C (50 F)
[2] High critical: 63 C (145 F); high warning 53 C (127 F)
Low critical: 0C (32 F); low warning 10 C (50 F)
[3] High critical: 63 C (145 F); high warning 53 C (127 F)
Low critical: 0C (32 F); low warning 10 C (50 F)
ES Electronics installed element list: 1, 2; with error: none
ES Electronics reporting element: 1
ES Electronics information by element:
[1] Serial number: IMS69813312CAFF
CPLD version: <N/A>
[2] Serial number: IMS698133130DC8
CPLD version: <N/A>
Embedded Switching Hub installed element list: 1, 2; with error: none
Shelf mapping (shelf-assigned addresses) for channel 1d:
Shelf 1: 29 28 27 26 25 24 23 22 21 20 19 18 17 16
Shelf 2: 45 44 43 42 41 40 39 38 37 36 35 34 33 32
Shelf 3: 61 60 59 58 57 56 55 54 53 52 51 50 49 48
I agree with mathew three conditions stated:
1.)You may not have redundant power to the shelf (and not having both PSU's running can have implications for how the disks respond in high-load situations)
2.)You may not be cooling effectively.
3.) firmware won't download to a shelf with an error condition.
becasue of the above three resons you get error messages regaridng the shelf drp..
Thanks
Is your filers running ONTAP 7.3.2 ?
Regards
adai
I had the exact same issue. It was easily resolved by upgrading ONTAP to the latest 7.3.2. Apparently this is a known bug.
Pls refer the bugs online for fixed version of Data ONTAP and the workaround for the same.
http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=383376
Regards
adai
Hi guys and thanks a lot for your quick answers. My appologies for responding so late, but I've been on vacation...
To respond to some of you:
One filer that reports the aforementioned errors has this vesion of ONTAP
NetApp Release 7.0.5: Wed Aug 9 00:27:38 PDT 2006
The other :
NetApp Release 7.0.5: Wed Aug 9 00:27:38 PDT 2006
Despite being THAT old they work flawlessly, but if it's a known bug that generates these messeges than I suppose it won't be a problem to upgrade to the latest stable version. Concerning what Mathew said, I don't see any particular errors when sifting through the output (except for what I've already noted):
environment status | grep -i "status" | grep -v normal
PSU Status ok
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574361; failed: 0
Shelf status: critical condition
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574360; failed: 0
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574361; failed: 0
Shelf status: critical condition
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574361; failed: 0
Status reads attempted: 3574361; failed: 0
environment status | egrep -i "failed|err"
Status reads attempted: 3574416; failed: 0
Control writes attempted: 41; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 45; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 40; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 84; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 21; failed: 0
Power Supply installed element list: 1, 2; with error: 2
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 16; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 25; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 20; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 17; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574415; failed: 0
Control writes attempted: 13; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 12; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 12; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 24; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 67; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 95; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 207; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 18; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 17; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 23; failed: 0
Power Supply installed element list: 1, 2; with error: 1
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 53; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 183; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 443; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 141; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none
Status reads attempted: 3574416; failed: 0
Control writes attempted: 746; failed: 0
Power Supply installed element list: 1, 2; with error: none
Cooling Element installed element list: 1, 2; with error: none
Temperature Sensor installed element list: 1, 2, 3; with error: none
ES Electronics installed element list: 1, 2; with error: none
Embedded Switching Hub installed element list: 1, 2; with error: none