Subscribe

Operations manager sees enclosure failures that do no exist

Hi all!

Bit of a weird problem with two FAS 3140 clusters, both have exemplary clean syslogs with no warnings whatsover. Yet operations manager about once a week will alert to a "enclosures failed" message with really scary content:

Enclosures Failed

EventEnclosures Failed
Sourcexxx/
Severity
AboutStatus of the enclosures on an appliance
ConditionThe following components failed Enclosure Serial No : OPS6773421DFB62 Enclosure Shelf Address : 1c.17 Enclosure Logical ID : 5:005:0cc002:1dfb62 List of Failed Power Supply Modules: ", " List of Failed Fan Modules: "1, 2" List of Failed Electronics Modules: "1, 2" List of Failed Temperature Sensor Modules (Over Temp): "3" List of Temperature Sensor Modules Warning (Over Temp): "1, 2, 3" Enclosure Serial No : OPS6773421DED14 Enclosure Shelf Address : 1a.32 Enclosure Logical ID : 5:005:0cc002:1ded14 List of Failed Power Supply Modules: ", " List of Failed Fan Modules: "1, 2" List of Failed Electronics Modules: "1, 2" List of Failed Temperature Sensor Modules (Over Temp): "3" List of Temperature Sensor Modules Warning (Over Temp): "1, 2, 3" Enclosure Serial No : OPS6773421DF9C3 Enclosure Shelf Address : 2c.17 Enclosure Logical ID : 5:005:0cc002:1df9c3 List of Failed Power Supply Modules: ", " List of Failed Fan Modules: "1, 2" List of Failed Electronics Modules: "1, 2" List of Failed Temperature Sensor Modules (Over Temp): "3" List of Temperature Sensor Modules Warning (Over Temp): "1, 2, 3" Enclosure Serial No : OPS8248721DE414 Enclosure Shelf Address : 2d.14 Enclosure Logical ID : 5:005:0cc002:1de414 List of Failed Power Supply Modules: ", " List of Failed Fan Modules: "1, 2" List of Failed Electronics Modules: "1, 2" List of Failed Temperature Sensor Modules (Over Temp): "3" List of Temperature Sensor Modules Warning (Over Temp): "1, 2, 3" Enclosure Serial No : OPS8248721DE443 Enclosure Shelf Address : 1b.31 Enclosure Logical ID : 5:005:0cc002:1de443 List of Failed Power Supply Modules: ", " List of Failed Fan Modules: "1, 2" List of Failed Electronics Modules: "1, 2" List of Failed Temperature Sensor Modules (Over Temp): "3" List of Temperature Sensor Modules Warning (Over Temp): "1, 2, 3" Enclosure Serial No : OPS8248721DE543 Enclosure Shelf Address : 1b.15 Enclosure Logical ID : 5:005:0cc002:1de543 List of Failed PoweThe following enclosures have become inactive


The really nifty thing is that it happens about once a week on both clusters with no autosupport or syslog messages whatsover, also we are running performance tests on these clusters which show no interruption or pause or whatever. everything is fine.

Why would ops manager report this problem?

Ops manager 3.7.1 (with ontap 732 plugin). Ontap 732 on both clusters, mixed SATA and FC shelves environment, dfm host diag shows all bells and whistles that you would expect.

Anyone else seens something like this?

Re: Operations manager sees enclosure failures that do no exist

This is a known issue with 7.3.2 ill update a reference shortly when I am back fully online

Kind regards

Rich

Apologies for the short reply, fat thumbs...small Blackberry

Re: Operations manager sees enclosure failures that do no exist

This is an issue with ONTAP 7.3.2  will happen with any monitoring or SNMP alert tool not just Ops-Mgr.

Take a look at the bugs online @ the link below.

http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=383376

It is fixed in7.3.2P2.

Regards

adai

Re: Operations manager sees enclosure failures that do no exist

we have few system running 7.3.3, even that generates the enclosure failure alerts which is not actually faulty. any advice much appreciated

Re: Operations manager sees enclosure failures that do no exist

Hi, I'd like to report the same issue here.  <bump>