I've hit a snag trying to test UPS & NetApp integration and I could use some advice...
Our client has a filer connected to two APC SMART-UPS 1500 RM devices which, when fitted with netwok cards, can be monitored by DataONTAP by using SMTP gets (?) from the UPS. I've checked the DOT version and it supports said UPS model. Now, when a critical-low is signaled by UPS - the controller will send out an EMS and initiate a clean shut-down. APC's website also claims that an Autosupport message will be sent. At least that's what it says here and here.
I've followed the instructions and I've set up monitoring on those two IP addresses through ups add and ups enable commands:
Fri Oct 7 11:18:23 CEST [ups.added:notice]: UPS at 10.205.0.XX with community public has been added. Fri Oct 7 11:18:23 CEST [ups.enabled:notice]: UPS at 10.205.0.XX has been enabled.
Fri Oct 7 11:21:53 CEST [ups.added:notice]: UPS at 10.205.0.YY with community public has been added.
Fri Oct 7 11:21:53 CEST [ups.enabled:notice]: UPS at 10.205.0.YY has been enabled.
ups status command reports that everything is OK and - so far so good. Next, I wanted to test the SMTP mechanism to see whether it really works so I defined NetApp's IP address as a SMTP receiver through APC GUI for the same community and sent out a test SMTP. Nothing. I've tried the same on the 2nd UPS with a 2nd IP address on the filer, still nothing. I've checked the system log, no mention of anything.
Shouldn't the controller log test SMTP or something else? How do I know if monitoring really works, other then waiting for a power failure and a critical-low on UPS to kick in?
Does anyone have any experience with this? NetApp reps? 😃
P.S. Oh, and one other thing...
What happens once the graceful shutdown is completed? Will the system boot automatically when the power is restored or does it have to be done... manually?
The filer is used purely for file sharing, there aren't any servers dependant on it and when power goes out, local users can't do much anyway. And since it crashed several times this year due to UPS battery drain (extended power-outs) and had some really strange HW issues that can be only explained by this, why wait for it to crash again and not shut it down properly? I mean, they already have this option - it seems perfectly normal that they take advantage of it...
So, what initiates the boot squence if NetApp is sitting on CFE prompt?
NetApp is not an SMTP Server, what i found is that because the power goes does normally also includes mail/smtp server and other networking components. Check that path of your data flow for autosupport emails. SMTP-->Router-->WWW etc.
So you need to specify a mailhost that will be accessible & do name resolution even when the power goes off.
if it's sitting at CFE prompt after a power outage check battery levels or configuration
to boot the system from this prompt is either 'bye' to reboot the system or 'boot_ontap' to do a normal bootup.
Mail/SMTP are in a different location, well provided with power, and the networking components are also connected to a local UPS device - so as long as the filer is UP, so are they. I think that part is OK.
When shutdown (halt) is initiated the filer disconnects CIFS/NFS sessions, flushes NVMEM, makes itself consistent, etc, ending up on Loader prompt (not CFE, sorry). At least it did so far.... Once PSU battery power drains completely, it will be completely turned off, and of course once power is restored - autoboot will take care of the rest. My question is what happens if, while the filer is sitting on LOADER prompt and there's still some juice left in UPS, power comes back on? It's very unlikely since the default value of the criticaltime option is 60 seconds but it's not entirely impossible. Right?
That's exactly what I'd like to know. I suppose that's why the criticaltime option is set to 60 seconds (just enough to do a proper shutdown before the power cut-off), but it still might happen.
Moreover, now that I think about it - I believe my client has both PSU units each connected to a different UPS at the moment. As far as I can tell, these are single phase devices each connected to a different phase. So if one phase fails, that will eventually bring the corresponding UPS into a critical-low state...And that will trigger a shutdown instruction for the filer, althought the second UPS might be doing fine on it's own, correct?
Hmm. I completely missed this...
In reality - if all 3 phases fail, then UPS monitoring is a good thing. But if single phase fails, or a UPS device battery fails, or power is restored during shutdown sequence - it will leave the filer hanging in midair and it'll have to be powercycled.