ONTAP Hardware

Powering down a failed controller

strider78
4,605 Views

Hello all!

 

We have a V6220 single-chassis dual-controller filer (controllers are named Filer-A and Filer-B). Some time ago Filer-A had a hardware failure (maybe voltage regulators but I'm not 100% sure), was taken over by Filer-B, rebooted by watchdog, but didn't came up. Now it's in zombie state, dead by fact but still powered on. Whole system works through alive Filer-B.

 

Is it safe to issue "system power off" command from Filer-A Service Processor CLI? Sorry for such a stupid question, but I'm not an expert in storage and we have a very mission-critical database on this system.

1 ACCEPTED SOLUTION

aborzenkov
4,598 Views
What is the reason to power it off? It is not required for replacement. I am not sure what "safe" means in this case. It should not affect another node in the same chassis if that was the question.

View solution in original post

4 REPLIES 4

aborzenkov
4,599 Views
What is the reason to power it off? It is not required for replacement. I am not sure what "safe" means in this case. It should not affect another node in the same chassis if that was the question.

strider78
4,591 Views

The reason is simple, it's a last resort, I hope that power cycle may heal it. Yes, I wanted to make sure that powering it off won't harm a working filer. Thank you very much for an answer.

cedric_renauld
4,567 Views

HEllo, 

 

I think your system is in Wauting for give back or maybe @loader prompt

Can you type before :

system console 

And check the state of your controler 

Anf on your survival controler, wath is the result of

cf status 

 

Thanks  

strider78
4,493 Views

Filer-A was dead completely, "system console" showed nothing, "cf status" on a survived Filer-B showed only that it has taken over a partner. Interconnect link was down too.

Then I issued "system power off" on a dead filer's SP. Powered it on after 5 minutes.

Bingo!

All hardware (voltage) issues were auto-deasserted, ONTAP booted normally and now is ready for a giveback.

Now I'm sure that my system was hit by a 500 days uptime bug. Upgraded my SP's firmware to latest compatible version.

Thanks all for help.

Public