Alarm LED on without any visible reason

aborzenkov · ‎2011-06-03

FAS3140 FMC. After testing power feeds to cabinet (with switching off half of PSUs including one of Brocade switches) I noticed alarm LED on one node (I do not know whether another node had it as well, it was a bit too far away to check). Filer View claimed status is normal, nothing to worry about. /etc/messages confirmed it by message "status returned to normal". All disks were properly MP-HAd. Cluster was enabled. There was no environment failures on any shelf or head. I am not sure what else to check.

How can I find out why alarm LED is lit? More importantly, how should I explain it to customer

The only missed feature right now is that setup is not yet complete so I need to mirror root aggregate. But if this is the reason, alarm LED was not lit before, although aggregate was not mirrored ...

shaunjurr · ‎2011-06-09

Hi,

There are a few software bugs in 8.0.1Px... that can cause this. IIRC, they are fixed in 8.0.1P4 or P5. This might be the cause of your LED error.

I actually have a P3 3070 cluster that also has an amber status led on with no apparent reason. I guess I'll have to get around to trying the upgrade too.

Hope this helps.

aborzenkov · ‎2011-06-10

This is 7.3.5.1.

ChrisHolloway · ‎2011-06-10

I had a similar problem of an inexplicable fault light on a new FAS3210. This bug:

http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=472202

and the steps in there helped.

PARISALLSTON · ‎2011-06-22

Hi,

You can try logging onto the RLM console and typing events all. This will show all system hardware events captured and logged by the RLM. It may shed some light for you.

Another thing would by to read the syslog messages in the /etc directory. These only go back six weeks, mind, before being over-written.

I had a custy with a 3210 who had their amber light come on and they couldnt find out the cause. Turned out the filer had an episode a couple of weeks earlier where it lost its ability to check the battery state for the NVMEM, which it had since rectified, but hadn't and couldn't clear the error log properly. In the end we had to perform a failover and remove/re-seat the affected to perform a ND power cycle of sorts.

Hope this helps.