Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
One of the network interfaces is showing up/down, need to troubleshoot
2022-02-25
07:35 AM
14,201 Views
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
One of our interfaces (Cluster Interface) inexplicably has the status of up/down. The port itself is healthy and is home commands doesn't seem to fix it. Please advise on how to proceed.
Even using the status admin down command and bring it back up didn't fix the issue
Solved! See The Solution
1 ACCEPTED SOLUTION
NetappGuy7 has accepted the solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What Protocols are being served ? It depends... but in general 'Takeover and giveback' allows HA configuration to perform nondisruptive operations and avoid service interruptions.
However, I think it makes sense to raise a ticket with NetApp, so that they can take a look at the logs/messages (root cause) and suggest next course of action.
13 REPLIES 13
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you check this kb: (May need a reboot to fix it )
Network interfaces show up/down following a Vifmgr restart in ONTAP 9.5 and earlier
https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/ONTAP_network_interfaces_show_up_down
Another Kb on similar lines:
https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Systems/FAS_Systems/Logical_Interfaces_(LIF)_up_down_after_ONTAP_node_Reboot
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm not aware of Vifmgr being restarded, but that seems like a decent solution. However i'm hesitatant to reboot one of the nodes because we're currently serving clients and I want to avoid a power outage. I know we have a failover system but I still feel hesitant
NetappGuy7 has accepted the solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What Protocols are being served ? It depends... but in general 'Takeover and giveback' allows HA configuration to perform nondisruptive operations and avoid service interruptions.
However, I think it makes sense to raise a ticket with NetApp, so that they can take a look at the logs/messages (root cause) and suggest next course of action.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yep I've already done so, hoping for the best
We're aware of nondisruptive operations but unfortunately it's still not a risk we're willing to take since we're with the government and we serve data to literally hundreds of thousands of clients
As of late, a new error popped up when trying to run the following command:
stcffn::> node run -node stcffn-02
telnet: connect to address ***.***.***.*: Host is down
telnet: Unable to connect to remote host
Despite the fact that both heads are healthy and reachable. Very odd
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is likely because the command being routed to the node shell of that node has to traverse the cluster LIF to reach the node, which is down. If you SSH'd into the node management LIF of that specific node and tried to run it again, I suspect you wouldn't have that issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You're absolutely correct!
So how do I bring the cluster LIF back up? Is deleting it and recreating it a viable option?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Worth trying. Modifying the home port to something different and then back might also be worth trying.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It doesn't look like its possible to delete a cluster lif (command failed: LIF "stcffn-02_clus1" cannot be removed because it is required to maintain quorum on node "stcffn-02".)
Nor is it possible to move its home port either. I have no idea how to proceed here. It doesn't look like faulty hardware but i'm not sure either. Really puzzling issue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Any update from NetApp on this issue?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've been in contact with someone, but I haven't been able to reach them for a while. At this point I need to escalate the case
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Did you end up following the previous suggestion and performed the reboot?
Do you have a case number to follow on this issue? @NetappGuy7
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I didn't perform a reboot... however, I was able to get a faulty cable replaced, which resolved the issue!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's great. NetApp will review the event logs/cluster-core logs to suggest. Most likely mgwd process might have got stuck. Anyway, feed us back once it's resolved.
