Solved: Re: Unable to access System Manager when migrating cluster mgmt. LIF to new nodes with ONTAP 9.6P5

HUX20002000 · ‎2020-02-11

I'm in the process of upgrading my two-node cluster, running ONTAP 9.6P5. I'm doing this by adding another two nodes to it, moving all the data and LIFs to that node pair, then decomming the old nodes. I've moved all volumes and data LIFs and everything's been working great...until now.

I'm at the point where I need to migrate the cluster mgmt. LIF to the new node pair. However, it won't work. This is the behavior:

- I do "network interface migrate ..." to either of the new nodes

- Command completes OK

- I can still ping the LIF and telnet to it on ports 80 & 443

- I can no longer reach the System Manager website

If I migrate the LIF back to either of the old nodes, access to the website is restored.

The cluster LIF is using the e0M ports and if I do "network interface failover-groups show ...", all four nodes' e0M ports are listed as failover targets in the same broadcast domain.

Any ideas why this isn't working correctly?

Jontaquet · ‎2020-06-11

OK SSL certificate fior the cluster was expired, that's the reason why.

CF https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/How_to_renew_an_SSL_certificate_in_ONTAP_9

Br,

Jonathan

View solution in original post

Ontapforrum · ‎2020-02-12

Hi,

Do you mind sharing this output

:::> network interface show -role cluster-mgmt
::> network interface show -failover -lif cluster_mgmt
::> network interface show -fields firewall-policy -lif cluster_mgmt
::> event log show

Are you using hosntame or Clust_mgmt LIF IP for accessing web-system_manager.

Does this works (From the other node):
https://IP.xx.xx.xx

Thanks!

andris · ‎2020-02-12

The key question would be whether the wrench ports are all connected to the same VLAN/IP subnet from a switch/router perspective.

HUX20002000 · ‎2020-02-12

@andris wrote:

The key question would be whether the wrench ports are all connected to the same VLAN/IP subnet from a switch/router perspective.

They are: nodes 1, 2 and 4 are connected to the same switch. Node 3 is connected to a partner switch.

All four switchports have the following config: "switchport access vlan 198"

That's it. Super simple.

andris · ‎2020-02-12

Where is the host/client - on the same subnet as VLAN 198 or not?

If you can ping/telnet/ssh from that localhost, it would seem to point to a potential firewall issue.
What exactly is the browser response? Not found - 404 or something else?

HUX20002000 · ‎2020-03-02

"Where is the host/client - on the same subnet as VLAN 198 or not?"

Different subnet.

"If you can ping/telnet/ssh from that localhost, it would seem to point to a potential firewall issue."

There aren't any firewalls in the path between my computer and the NetApp.

"What exactly is the browser response? Not found - 404 or something else?"

Depends on the browser:

Chrome: "ERR_CONNECTION_CLOSED"
Firefox: "PR_END_OF_FILE_ERROR"
IE: "This page can’t be displayed"
Edge: "Hmm, we can't reach this page"

The output from Fiddler is more useful:

"HTTPS handshake to 10.25.1.80 (for #7) failed. System.IO.IOException Authentication failed because the remote party has closed the transport stream."

Note that this isn't solely to do with the floating cluster mgmt. interface. I can't connect to System Manager using the mgmt. IP addresses for the new nodes either, whereas I can using those of the old nodes.

So the overall problem seems to be that the new nodes simply aren't permitting any HTTPS connections.

andris · ‎2020-03-02

I would review the LIF service policies on the cluster (or at least on the new nodes). LIF Service Policies are the new method to control what functions/operations/access a LIF supports in 9.6+.

Review:

LIFs and service policies in ONTAP 9.6 and later

Configuring LIF service policies

network interface service-policy
network interface service-policy show - Display existing service policies

network interface service-policy add-service - Add an additional service entry to an existing service policy (advanced)
network interface service-policy clone - Clone an existing network service policy (advanced)
network interface service-policy create - Create a new service policy (advanced)
network interface service-policy delete - Delete an existing service policy (advanced)
network interface service-policy modify-service - Modify a service entry in an existing service policy (advanced)
network interface service-policy remove-service - Remove a service entry from an existing service policy (advanced)
network interface service-policy rename - Rename an existing network service policy (advanced)
network interface service-policy restore-defaults - Restore default settings to a service policy (advanced)

HUX20002000 · ‎2020-03-02

Thanks for the info. As far as I can tell, this already looks correct:

::> network interface service-policy show -vserver [my cluster] -instance -policy default-management

Vserver: [my cluster]
Policy Name: default-management
Included Services: management-core, management-autosupport,
management-ssh, management-https
Service: Allowed Addresses: management-core: 0.0.0.0/0
management-autosupport: 0.0.0.0/0
management-ssh: 0.0.0.0/0
management-https: 0.0.0.0/0

However, this is a cluster-wide policy, whereas the problem is confined to two of the four nodes.

Is there a per-node setting somewhere that configures whether or not it will accept HTTPS? Or maybe a way to temporarily disable HTTPS cluster-wide and force HTTP, just to test?

Adrián · ‎2020-05-18

Same problem. I have checked everything but seems to be network issues instead of configuration missmatch.

Jontaquet · ‎2020-06-11

Hello Adrian,

I have the exact same issue, just added into a 4 nodes cluster a new FAS8200 HA PAIR, Cluster expansion went well but when I Migrate the cluster mgmt Lif to the new FAS8200 e0M ports, no more Http !

Ping is OK, SSH is OK but sys mgr not..

Did you find out why ? open a case with Support team ? Might be a bug with this new functionnality "network interface service-policy " ?

Thanks.

Br,

Jnathan

Jontaquet · ‎2020-06-11

few more precisions :

all e0M connected to the same switch,no Vlan, no firewall between, I've even connected my laptop directly to the same switch but still the same issues. Try to reboot again the new added nodes but no changes.

it also happens the same if I try to reach sysmgr through the mgmt node LIFs, definitely must be a bug somewhere..

Br,

Jonathan

Jontaquet · ‎2020-06-11

OK SSL certificate fior the cluster was expired, that's the reason why.

CF https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/How_to_renew_an_SSL_certificate_in_ONTAP_9

Br,

Jonathan

Adrián · ‎2020-06-11

Same solution for me Jontaquet. Thanks for your reply.

HUX20002000 · ‎2020-02-12

"Do you mind sharing this output"

I'm not at liberty to paste the event log, but the rest I can do:

mycluster::> network interface show -role cluster-mgmt
            Logical    Status     Network            Current       Current Is
Vserver     Interface  Admin/Oper Address/Mask       Node          Port    Home
----------- ---------- ---------- ------------------ ------------- ------- ----
mycluster
            cluster_mgmt up/up    10.25.1.80/24      mycluster-02 e0M     false


mycluster::> network interface show -failover -lif cluster_mgmt
         Logical         Home                  Failover        Failover
Vserver  Interface       Node:Port             Policy          Group
-------- --------------- --------------------- --------------- ---------------
mycluster
         cluster_mgmt    mycluster-01:e0M     broadcast-domain-wide
                                                               Management_VLAN_198
                         Failover Targets: mycluster-01:e0M,
                                           mycluster-02:e0M,
                                           mycluster-04:e0M,
                                           mycluster-03:e0M


mycluster::> network interface show -fields firewall-policy -lif cluster_mgmt
vserver    lif          firewall-policy
---------- ------------ ---------------
mycluster cluster_mgmt mgmt

"Are you using hosntame or Clust_mgmt LIF IP for accessing web-system_manager"

Both. Same behavior either way.

"Does this works (From the other node): https://IP.xx.xx.xx"

After testing this, we may be getting somewhere:

- Before migrating the LIF to one of the new nodes, I can access System Manager from both of the old nodes (using the node hostname or IP) but I cannot access it from either of the new nodes

- After migrating it to the new nodes, I can still access it from both of the old nodes (using the hostname or IP) but I can't access it from either of the new nodes and I can't access it via the cluster VIP hostname or IP

So the bottom line, I think, is that for whatever reason, the System Manager isn't accessible at all from the new nodes and I don't know why.