SolidFire and HCI

How can I fix this problem in NetApp HCI h410c Storage Node?

Timous
4,389 Views

What's the problem of this situation?

 

I have done this step on Storage Node as below ,

  1. Open the Terminal User Interface (TUI) console of the node via BMC/IPMI or iDRAC
  2. Select Maintenance Tasks
  3. Select Factory Reset Node

then after reset, I cannot enter to this GUI again.

Timous_1-1641960227141.jpeg

 

So, I tried to make usb with solidfire-rtfi-sodium-patch5-11.5.0.63.iso

then when it started installing, 

  1. Yes’ to ‘Proceed’.
  2. ‘No’ to ‘Extensive Hardware Tests’.

until it ran to firmware checking, then it showed information as below

  1. 'There were 1 hardware related RTFI_HARDWARE_FAILURES'
  2. Tag 'NUM_DRIVES
  3. hardware_check failed

Timous_0-1641959563647.jpeg

 

How can I fix it?

 

Thank you!

1 ACCEPTED SOLUTION

elementx
4,321 Views

I'd say contact support.

 

It's unusual to see none of the drives were detected. If one failed, that'd be possible. It doesn't look possible that 6 failed, but I don't know how to solve this. I haven't encountered that situation yet.

View solution in original post

8 REPLIES 8

elementx
4,363 Views

It would have helped if you mentioned what was wrong with the node before you decided to reimage the node.

 

Casual search engine check for `RTFI_HARDWARE_FAILURES` shows several KB articles, one of them is

 

https://kb.netapp.com/Advice_and_Troubleshooting/Hybrid_Cloud_Infrastructure/H_Series/Hardware_check_failed_with_CPU_Intel_Xeon_Gold_5120T_on_H610S

 

I'm not logged in so please check if this or other KB articles help you.

 

This isn't a common problem so maybe reach out to Support to eliminate guesswork.

Timous
4,323 Views

I don't understand this information, it isn't like this situation

https://kb.netapp.com/Advice_and_Troubleshooting/Hybrid_Cloud_Infrastructure/H_Series/Hardware_check_failed_with_CPU_Intel_Xeon_Gold_5120T_on_H610S

 

In my situation like this

tagActualOpExpectedPassed
NUM_DRIVES0=60

 

So, do you have any idea or sugestion that I can try it?

 

thank you!

elementx
4,322 Views

I'd say contact support.

 

It's unusual to see none of the drives were detected. If one failed, that'd be possible. It doesn't look possible that 6 failed, but I don't know how to solve this. I haven't encountered that situation yet.

Timous
4,320 Views

I see. thank you for reply.

NetApp_AU
4,065 Views

I am not sure if you got an answer from Support yet, but here is what I am seeing from your post.

 

The error message is saying that the RTFI process sees zero drives installed while six is expected. This is odd because you are imaging a H410C compute node which does not utilize drives like the storage nodes do.

 

I looked at the filename, solidfire-rtfi-sodium-patch5-11.5.0.63.iso, and this is the element RTFI image for the storage nodes, not compute nodes. That is why the process is checking for drives.

 

For comparison, an example of a compute node RTFI filename would be solidfire-compute-sodium-patch5-11.5.0.63.iso.

 

Can you confirm if you are trying to image a H410C compute node or a H410S storage node?

 

You can get the compute RTFI image for Element 11.5 here.

Team NetApp

elementx
4,063 Views

That's an interesting observation.

The RTFI screenshot shows `actual=[H300S] == expect=[H300S]` so even if the image was wrong RTFI probably shouldn't reach that conclusion. It could be that the blades were shuffled around and now this blade's disk slots are empty.

NetApp_AU
4,062 Views

I think the RTFI process did not throw an error about the model number because the H410C compute node has very similar hardware to an H410S.

 

The H410S storage nodes are the exact same as the H300S, H500S and H700S storage nodes, just with a revised model number. See below.

 

H410S-11110 (H300S)
H410S-21110 (H500S)
H410S-31110 (H700S)

 

My theory is that we have an H410C compute node, but we are trying to install an H410S/H300S RTFI image on it. The process does a node model check, sees enough of the server components to match an H410S which the process is programmed for, and continues on with the checks until it hits the drive count check which stops it.

If my theory is correct, I agree that the RTFI process should include a compute vs storage node check specifically.

Team NetApp

elementx
4,061 Views

I agree with that, but I also think they may have a "personality" profile stored in CMOS or elsewhere. One useful feature of not continuing unless the personality says "Storage Node" is that accidental RTFI with Compute ISO wouldn't wipe a Storage Node's OS.

 

We'll see if the OP clarifies.

Public