Hi Community,
I'm dealing with a SolidFire SF4805 node issue and would appreciate any advice, especially since this unit is out of warranty.
---
Environment - Cluster: 4-node SolidFire SF4805, Element OS 11.8.0.23 - Affected Node: SF03 (Node ID: 3) - Cluster is still operational with remaining 3 nodes
---
Symptoms - All 10 Samsung SSDs on SF03 show Status = **failed** simultaneously - Active Drives on SF03 = 0 - Node Status = active (node is online but serving no storage) - Replication Port = "-" (not participating in cluster replication)
Active Alerts: - `hardwareConfigMismatch` — MPTSAS_BIOS_VERSION = Unknown (expected != Unknown) - `hardwareConfigMismatch` — MPTSAS_FIRMWARE_VERSION = Unknown (expected != Unknown) - `irqBalanceFailed` — mpt3sas0-msix0 through msix7 interrupts not found - `networkConfig` — eth1 and eth3 down - `notUsingLACPBondMode` — Bond10G not using LACP
Hardware Check Output (xCheck): ``` MPTSAS_BIOS_VERSION: Passed=false, actual=Unknown MPTSAS_FIRMWARE_VERSION: Passed=false, actual=Unknown (All other components: CPU, RAM, NIC, BIOS, iDRAC → Passed=true) ```
---
My Analysis All symptoms point to the SAS HBA controller (mpt3sas / LSI) being undetectable by the OS. Since all 10 drives failed at exactly the same time rather than individually, I believe the drives themselves are likely still healthy — the controller is simply not being recognized on the PCIe bus.
Firmware versions are also behind: - iDRAC: running 2.40.40.40 (current: 2.75.75.75) - BIOS: running 2.2.5 (current: 2.8.0)
---
Questions 1. Can anyone confirm this is a SAS controller hardware failure rather than a firmware/software issue? 2. What is the exact SAS controller model used in the SF4805? (Trying to source a replacement) 3. Has anyone successfully replaced the SAS controller on an SF-series node and recovered the drives/data? 4. If I reseat or replace the controller and the node rejoins the cluster, will Element OS automatically re-add the drives, or is there a manual process? 5. Any risk of data loss on the drives themselves if the controller is replaced?
---
What I've Tried - Verified all alerts in Element OS UI (Reporting → Alerts) - Confirmed Node Details: all 10 drive slots showing failed - Reviewed hardware check JSON output - Cross-referenced firmware versions against NetApp docs: https://docs.netapp.com/us-en/element-software/hardware/fw_storage_nodes.html#sf_nodes
---
Any guidance is greatly appreciated. Thanks in advance!
... View more
We have decommissioned all of our NetApp H610s-4 nodes since they are coming up eol for support. We wanted to use them in a test/dev environment as ESXi hosts with vSan. Sadly, after removing these and the MNodes we discovered we cannot use the drives in these systems. They appear to be locked. I tried several ways to try unlocking the drives with nvme-cli and other similar tools but I have not come across a way to unlock them. Our maintenance support has lapsed so I cannot download the Element Software to try RTFI and I cannot log into Ember directly since the MNode is offline. Are there any known methods or nvme-cli commands that work to unlock these drives? Any assistance will be greatly appreciated.
... View more
Hello folks, we hit the known issue in regards to login the system after the power outage . https://kb.netapp.com/on-prem/solidfire/Element_OS_Kbs/Login_errors_using_admin_credentials__Cluster_UI_and_mNode_UI The system is not supported for a while 🙂 not in production but still used in LAB env. as backend storage. I would love to see a solution. Would it be possible to let me know how to get rid of this ? Many thanks. Lukas
... View more
Folks looking to repurpose their NetApp HCI rigs or perhaps use PVE with SolidFire... The first screenshot shows a SolidFire volumes-related menu. The second is Proxmox-related (like "datastores" in another product) and the third illustrates the purpose of the entire thing, which (IMO) is to avoid the mess of inconsistency. The thing is pretty fast (no Java or other bloat) and while I wouldn't call it "enterprise-ready" it should be good for sandbox and lab enviros. And if it's not, the source is at the link (see the first screenshot) so you can make it better if you want.
... View more
Hi all, I'm in a very similar situation to @itgod_cluelessdog with his question "Reuse hardware of NetApp H410C - unable to boot" https://community.netapp.com/t5/SolidFire-and-HCI/Reuse-hardware-of-NetApp-H410C-unable-to-boot/m-p/456936 I'd like to repurpose my 4 H410S nodes, they contain quality CPUs, quite a handful of RAM and 6 SSDs each. A VMware vSAN playground should be achievable. However, the nodes are equipped with BIOS NA2.1 dating back to 2017, and booting a USB Key in UEFI boot mode does not seem to be supported. I tried to find newer HCI Storage Node firmware at mysupport, but failed. HCI Compute Node firmware is available, though. Could you point me to the newest firmware for NetApp HCI H410S nodes? That would be awesome. Thanks, Raoul.
... View more