ONTAP Discussions

System Manager Access Stops After cluster_mgt LIF Migration

nzmarkc
8,333 Views

I have a temporary switched cluster to enable retiring of an AFF8060. Scenario is as follows:

AFF8060-N1 |__old nodes
AFF8060-N2 |


AFFA300-N3 |__new nodes
AFFA300-N4 |

 

Cluster management LIF is on e0i on the AFF8060. All data and LIFs except cluster_mgmt have been migrated. When I migrate the cluster_mgmt LIF I can no longer get into the System Manager gui. I took the following steps and had the following outcomes:

  1. Migrated cluster_mgmt LIF from AFF8060-N1-e0i to AFFA300-N3-e0M
    • migration processed without issues
    • access to system manager via browser does not work, but I can still SSH into new node cluster_mgmt LIF IP
  2. Migrated cluster_mgmt LIF from AFF8060-N1-e0i to AFFA300-N3-e0c
    • migration processed without issues
    • access to system manager does not work, and I cannot SSH into new node cluster_mgmt LIF IP
  3. Migrated cluster_mgmt LIF back to original home and all functionality is back.

Solutions tried:

  • disabled/re-enabled web services
  • used sysinternals 'psping' on AFFA300-N3-e0M IP address. Both port 80 & 443 are seen to be open.
  • cluster firewall checked and is open (all settings are 0.0.0.0/0).

I have a ticket open with NetApp but thought I would reach out to the community for help too. Does anyone have any suggestions of what might be wrong or a possible solution? Thank you!

1 ACCEPTED SOLUTION

nzmarkc
7,611 Views

Thank you all for your help and input. Rebooting the nodes did not fix the problem. Turns out it was a certificate issue. Once we renewed the certificates the problem went away. Thanks to NetApp support!!

View solution in original post

10 REPLIES 10

GidonMarcus
8,265 Views

Hi

 

What ONTAP version you're running?

What do you get when you access the https interface, a timeout or some error code?

Any messages in the event log?

Is system manager and SSH accessible on the individual nodes MGMT IP's (not cluster) ?

 

Thanks

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK

nzmarkc
8,241 Views
  1. ONTAP 9.5P4
  2. Event log shows: "vifmgr.bcastDomainPartition: Broadcast domain Default is partitioned into 2 groups on nod RELNAPCLUS01-01. The different groups are: {e0M}, {e0i}. LIFs hosted on the ports in this broadcast domain may be at the risk of seeing connectivity issues. "
  3. System manager via _old_ node IP shows a warning message: "OnCommand system manager is unable to identify if this cluster was set up successfully."

SpindleNinja
8,183 Views

can you post the output of broadcast-domain show 

nzmarkc
8,164 Views

Thank you for your questions. They have helped me with my thinking.

We think we may have found the cause of this. Turns out the new nodes (N3 & N4) needed rebooting. I was able to boot N4 due to it not having any CIFS connections, but N3 will have to wait until I get an outage window. I will post the outcome later in the week when I am back onsite.

SpindleNinja
8,149 Views

Can't you just take node 3 over with 4 and no downtime?    

aborzenkov
8,095 Views

@SpindleNinja wrote:

Can't you just take node 3 over with 4 and no downtime?    


With CIFS? How do you avoid interruption?

SpindleNinja
8,036 Views

@aborzenkov  I was reading at he was just going to reboot the node, where as takeover is just a minimal blip. 

 

Still odd that it has to takeover to fix it.

nzmarkc
8,046 Views

Customer has jobs running 24/7 that use the CIFS connection. Even the slight discontinuity that a CIFS migration causes can botch a job. So the reboot has to happen during an outage window.

Adrián
7,626 Views

Have you found the solution rebooting nodes N3 and N4? It will be much appreciatted to get confirmation for your side.

nzmarkc
7,612 Views

Thank you all for your help and input. Rebooting the nodes did not fix the problem. Turns out it was a certificate issue. Once we renewed the certificates the problem went away. Thanks to NetApp support!!

Public