Data ONTAP Discussions

System Manager Access Stops After cluster_mgt LIF Migration

I have a temporary switched cluster to enable retiring of an AFF8060. Scenario is as follows:

AFF8060-N1 |__old nodes
AFF8060-N2 |


AFFA300-N3 |__new nodes
AFFA300-N4 |

 

Cluster management LIF is on e0i on the AFF8060. All data and LIFs except cluster_mgmt have been migrated. When I migrate the cluster_mgmt LIF I can no longer get into the System Manager gui. I took the following steps and had the following outcomes:

  1. Migrated cluster_mgmt LIF from AFF8060-N1-e0i to AFFA300-N3-e0M
    • migration processed without issues
    • access to system manager via browser does not work, but I can still SSH into new node cluster_mgmt LIF IP
  2. Migrated cluster_mgmt LIF from AFF8060-N1-e0i to AFFA300-N3-e0c
    • migration processed without issues
    • access to system manager does not work, and I cannot SSH into new node cluster_mgmt LIF IP
  3. Migrated cluster_mgmt LIF back to original home and all functionality is back.

Solutions tried:

  • disabled/re-enabled web services
  • used sysinternals 'psping' on AFFA300-N3-e0M IP address. Both port 80 & 443 are seen to be open.
  • cluster firewall checked and is open (all settings are 0.0.0.0/0).

I have a ticket open with NetApp but thought I would reach out to the community for help too. Does anyone have any suggestions of what might be wrong or a possible solution? Thank you!

8 REPLIES 8

Re: System Manager Access Stops After cluster_mgt LIF Migration

Hi

 

What ONTAP version you're running?

What do you get when you access the https interface, a timeout or some error code?

Any messages in the event log?

Is system manager and SSH accessible on the individual nodes MGMT IP's (not cluster) ?

 

Thanks

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK

Re: System Manager Access Stops After cluster_mgt LIF Migration

  1. ONTAP 9.5P4
  2. Event log shows: "vifmgr.bcastDomainPartition: Broadcast domain Default is partitioned into 2 groups on nod RELNAPCLUS01-01. The different groups are: {e0M}, {e0i}. LIFs hosted on the ports in this broadcast domain may be at the risk of seeing connectivity issues. "
  3. System manager via _old_ node IP shows a warning message: "OnCommand system manager is unable to identify if this cluster was set up successfully."

Re: System Manager Access Stops After cluster_mgt LIF Migration

can you post the output of broadcast-domain show 

Re: System Manager Access Stops After cluster_mgt LIF Migration

Thank you for your questions. They have helped me with my thinking.

We think we may have found the cause of this. Turns out the new nodes (N3 & N4) needed rebooting. I was able to boot N4 due to it not having any CIFS connections, but N3 will have to wait until I get an outage window. I will post the outcome later in the week when I am back onsite.

Highlighted

Re: System Manager Access Stops After cluster_mgt LIF Migration

Can't you just take node 3 over with 4 and no downtime?    

Re: System Manager Access Stops After cluster_mgt LIF Migration


@SpindleNinja wrote:

Can't you just take node 3 over with 4 and no downtime?    


With CIFS? How do you avoid interruption?

Re: System Manager Access Stops After cluster_mgt LIF Migration

Customer has jobs running 24/7 that use the CIFS connection. Even the slight discontinuity that a CIFS migration causes can botch a job. So the reboot has to happen during an outage window.

Re: System Manager Access Stops After cluster_mgt LIF Migration

@aborzenkov  I was reading at he was just going to reboot the node, where as takeover is just a minimal blip. 

 

Still odd that it has to takeover to fix it.

Forums