ONTAP Hardware
ONTAP Hardware
PSU1 FRU is not present in the chassis alert message was show on node A. I tried unplug and plug again the PSU, but alert still here. SP was tried reboot before, may i know what is the next step i can do. Many thanks.
logging as below:
-----------------------------------------------------------
WXXXXXXXXXX::> system health alert show
Node: WXXXXXXXXX-B
Resource: PSU1 FRU#042012002634
Severity: Critical
Indication Time: Mon Sep 06 07:02:21 2021
Suppress: false
Acknowledge: false
Probable Cause: PSU1 FRU is not present in the chassis 042012002634.
The nodes in this chassis are WXXXXXXXXX-A.
Possible Effect: The nodes in the chassis might not function
effectively or redundancy might be lost.
Corrective Actions: 1. Plug in PSU1 FRU correctly into the slot.
2. Refer to the Hardware specification guide for more information on the position of the field-replaceable unit (FRU) and ways to check or replace it.
3. Contact support personnel if the alert persists.
-------------------------------------------------------------------------------------------------------
event show:
monitor.fan.failed: Multiple fans has failed.
monitor.fru.info.unreadable: The inventory information of FRU PSU2 is not readable.
monitor.fru.info.unreadable: The inventory information of FRU PSU1 is not readable.
callhome.c.fan.fru.fault: Call home for CHASSIS FAN FRU FAILED: Multiple fans have failed
monitor.temp.unreadable: The controller temperature (Midplane 1 Temp) is not readable.
monitor.temp.unreadable: The controller temperature (Midplane 2 Temp) is not readable.
monitor.temp.unreadable: The controller temperature (Midplane 3 Temp) is not readable.
monitor.temp.unreadable: The controller temperature (Midplane 4 Temp) is not readable.
monitor.fan.failed: Multiple fans has failed.
monitor.fan.failed: Multiple fans has failed.
monitor.fan.failed: Multiple fans has failed.
monitor.fan.failed: Multiple fans has failed.
Solved! See The Solution
Open a case with the support center and start scheduling some downtime for a chassis swap, but I must stress this is very very uncommon.
The reboot and the upgrade should fix it.
Try the solution suggested, if it doesn't work. Call Support.
Hi, thanks for your reply. But I was checked on system environment sensors show, node A PSU1 FRU state was fault.
-----------------------------------------------
WXXXXXXXX::> system environment sensors show
Node Sensor State Value/Units
---- ------------- -------- -----------
WXXXXXXXXX-A
PSU2 FRU fault
MULTIFAULT
PSU1 FRU fault
MULTIFAULT
SP Status normal
IPMI_HB_OK
mSATA Status normal
OK
mSATA Pres normal
PRESENT
Partner Status failed
PSU1 Present normal
PRESENT
PSU1 5V
init-failed
- mV - - - -
PSU1 12V
init-failed
- mV - - - -
--------------------------------------------------------
NetApp also fault light on.
Have a look at this kB:
https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Systems/FAS_Systems/Power_supply_cannot_detect_power_off_correctly
Have a look at this, though it doesn't say your specific Model:
https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Systems/FAS_Systems/CHASSIS_FAN_FRU_FAILED%3A_Multiple_fans_have_failed_after_upgrading_...
Parallelly, log a call with NetApp Support.
Thanks again, I will take a look for those KB.
Reboot the BMCs on both nodes in the chassis.
Then update them to BMC 11.6.
The sensor reading failures can indicate issues with IPMI communication or BMC unhealthiness.
Create a case with Tech Support, ideally.
Cool, Thanks andris. But is that any services impact druing BMC upgrade?
BMC upgrades are generally non-disruptive - rebooting the BMC first will help ensure a smooth upgrade leading to non-disruptive activity. Command is "system service-processor reboot-sp -node nodename"
Thanks for your reply Alex, i will try it later. I was saw one of KB said that to upgrade shelf firmware will help for this case. If i upgrade for shelf firmware also, did any services impact?
Shelf firmware update is also generally non-disruptive, but it is important to perform the BMC reboot and BMC upgrade first.
That cool, thanks a again Alex. If after upgrade SP and shelf firmware the alert still exist, what should i do on next step?
Open a case with the support center and start scheduling some downtime for a chassis swap, but I must stress this is very very uncommon.
The reboot and the upgrade should fix it.
I hope it can fix it this problom by upgrade, I try to contect with customer to upgrade those part first. BTW, thanks for your help again Alex.
Hi Alex, i tried to update SP and disk shelf firmware. but system environment sensors show still status on that as below:
PSU2 FRU fault
MULTIFAULT
PSU1 FRU fault
MULTIFAULT
SP Status normal
IPMI_HB_OK
mSATA Status normal
OK
mSATA Pres normal
PRESENT
Partner Status failed
PSU1 Present normal
PRESENT
PSU1 5V
init-failed
- mV - - - -
PSU1 12V
init-failed
- mV - - - -
PSU1 5V Curr
what should i do on next step?