Active IQ and AutoSupport Discussions
Active IQ and AutoSupport Discussions
need help URGENTLY!! customer started getting these alerts to begin with:
Node: UKIPCLU02-02
Time: Wed, Jul 24 21:11:45 2019 +0100
Severity: ALERT
Message: cpeer.addr.warn.host Address 10.87.20.14 is not any of the addresses that peer cluster UKDRCLU02 considers valid for the cluster peer relationship; Details: An introductory RPC to the peer address "10.87.20.14" failed to connect: RPC: Remote system error [from mgwd on node "UKIPCLU02-02" (VSID: -3) to xcintro at 10.87.20.14]. Verify that the peer address is correct and try again.
Description: This message occurs when a stable IP address for the peer cluster is no longer valid, making the peer cluster unreachable and causing cross-cluster operations to fail.
Corrective Action: Correct the stable IP address using the "cluster peer modify" command. Valid addresses for the remote ("peer") cluster are given under "Active IP Addresses" when you use the "cluster peer show -instance" command. Verify connectivity by using the "cluster peer ping" command. If you need further assistance, contact NetApp technical support.
but on trying to get the node back he tried the below and got this error
---
BMC UKDRCLU02-01> system console
Type Ctrl-D to exit.
boot_ontap
Could not load fat://boot0/X86_64/freebsd/image1/kernel:Device not found
ERROR: Error booting OS on: 'boot0' file: fat://boot0/X86_64/freebsd/image1/kernel (boot0,fat)
*** command status = Device not found(-6)
Im looking for some help to get this customer back ASAP
Solved! See The Solution
I got around to opening the docx (which please post just the image next time)
They have a takeover, and from the error it sounds like the second node will not boot.
I would open a P1 asap.
I would open a P1...
Are they in takeover and the second node can't boot?
yes they are in take over and it states that is was successful, but the wierd thing is that it was first brought to our attention by the below alert;
Message: cpeer.addr.warn.host Address 10.87.20.14 is not any of the addresses that peer cluster UKDRCLU02 considers valid for the cluster peer relationship; Details: An introductory RPC to the peer address "10.87.20.14" failed to connect: RPC: Remote system error [from mgwd on node "UKIPCLU02-01" (VSID: -3) to xcintro at 10.87.20.14]. Verify that the peer address is correct and try again.
Description: This message occurs when a stable IP address for the peer cluster is no longer valid, making the peer cluster unreachable and causing cross-cluster operations to fail.
Corrective Action: Correct the stable IP address using the "cluster peer modify" command. Valid addresses for the remote ("peer") cluster are given under "Active IP Addresses" when you use the "cluster peer show -instance" command. Verify connectivity by using the "cluster peer ping" command. If you need further assistance, contact NetApp technical support.
they also followed this up with the below
For your info,
UKIPCLU02::> cluster peer show -instance
Peer Cluster Name: UKDRCLU02
Remote Intercluster Addresses: 10.87.20.14, 10.87.20.15
Availability of the Remote Cluster: Partial
Remote Cluster Name: UKDRCLU02
Active IP Addresses: 10.87.20.15
Cluster Serial Number: 1-80-000011
Remote Cluster Nodes: UKDRCLU02-01, UKDRCLU02-02
Remote Cluster Health: true
Unreachable Local Nodes: -
Address Family of Relationship: ipv4
Authentication Status Administrative: use-authentication
Authentication Status Operational: ok
Last Update Time: 7/24/2019 15:08:51
IPspace for the Relationship: Default
Peer Cluster Name: UKIPCLU01
Remote Intercluster Addresses: 10.0.1.150, 10.0.1.151
Availability of the Remote Cluster: Unavailable
Remote Cluster Name: UKIPCLU01
Active IP Addresses: 10.0.1.151, 10.0.1.150
Cluster Serial Number: 1-80-012657
Remote Cluster Nodes: UKIPCLU01-01, UKIPCLU01-02
Remote Cluster Health: -
Unreachable Local Nodes: -
Address Family of Relationship: ipv4
Authentication Status Administrative: use-authentication
Authentication Status Operational: ok
Last Update Time: 7/24/2019 15:08:19
IPspace for the Relationship: Default
2 entries were displayed.
UKIPCLU02::> ping -node UKIPCLU02-01 10.87.20.14
UKIPCLU02::> ping -node UKIPCLU02-01 10.87.20.15
10.87.20.15 is alive
just posted some screen shots of the system
what's "cluster show" say on each cluster.
I got around to opening the docx (which please post just the image next time)
They have a takeover, and from the error it sounds like the second node will not boot.
I would open a P1 asap.