Solved: AFF A220

robc · ‎2019-07-24

need help URGENTLY!! customer started getting these alerts to begin with:

Node: UKIPCLU02-02
Time: Wed, Jul 24 21:11:45 2019 +0100
Severity: ALERT

Message: cpeer.addr.warn.host Address 10.87.20.14 is not any of the addresses that peer cluster UKDRCLU02 considers valid for the cluster peer relationship; Details: An introductory RPC to the peer address "10.87.20.14" failed to connect: RPC: Remote system error [from mgwd on node "UKIPCLU02-02" (VSID: -3) to xcintro at 10.87.20.14]. Verify that the peer address is correct and try again.
Description: This message occurs when a stable IP address for the peer cluster is no longer valid, making the peer cluster unreachable and causing cross-cluster operations to fail.

Corrective Action: Correct the stable IP address using the "cluster peer modify" command. Valid addresses for the remote ("peer") cluster are given under "Active IP Addresses" when you use the "cluster peer show -instance" command. Verify connectivity by using the "cluster peer ping" command. If you need further assistance, contact NetApp technical support.

but on trying to get the node back he tried the below and got this error

---

BMC UKDRCLU02-01> system console

Type Ctrl-D to exit.

boot_ontap

Could not load fat://boot0/X86_64/freebsd/image1/kernel:Device not found

ERROR: Error booting OS on: 'boot0' file: fat://boot0/X86_64/freebsd/image1/kernel (boot0,fat)

*** command status = Device not found(-6)

Im looking for some help to get this customer back ASAP

SpindleNinja · ‎2019-07-24

I got around to opening the docx (which please post just the image next time)

They have a takeover, and from the error it sounds like the second node will not boot.

I would open a P1 asap.

View solution in original post

SpindleNinja · ‎2019-07-24

I would open a P1...

Are they in takeover and the second node can't boot?

robc · ‎2019-07-24

yes they are in take over and it states that is was successful, but the wierd thing is that it was first brought to our attention by the below alert;

Message: cpeer.addr.warn.host Address 10.87.20.14 is not any of the addresses that peer cluster UKDRCLU02 considers valid for the cluster peer relationship; Details: An introductory RPC to the peer address "10.87.20.14" failed to connect: RPC: Remote system error [from mgwd on node "UKIPCLU02-01" (VSID: -3) to xcintro at 10.87.20.14]. Verify that the peer address is correct and try again.

Description: This message occurs when a stable IP address for the peer cluster is no longer valid, making the peer cluster unreachable and causing cross-cluster operations to fail.

Corrective Action: Correct the stable IP address using the "cluster peer modify" command. Valid addresses for the remote ("peer") cluster are given under "Active IP Addresses" when you use the "cluster peer show -instance" command. Verify connectivity by using the "cluster peer ping" command. If you need further assistance, contact NetApp technical support.

robc · ‎2019-07-24

they also followed this up with the below

For your info,

UKIPCLU02::> cluster peer show -instance

Peer Cluster Name: UKDRCLU02

Remote Intercluster Addresses: 10.87.20.14, 10.87.20.15

Availability of the Remote Cluster: Partial

Remote Cluster Name: UKDRCLU02

Active IP Addresses: 10.87.20.15

Cluster Serial Number: 1-80-000011

Remote Cluster Nodes: UKDRCLU02-01, UKDRCLU02-02

Remote Cluster Health: true

Unreachable Local Nodes: -

Address Family of Relationship: ipv4

Authentication Status Administrative: use-authentication

Authentication Status Operational: ok

Last Update Time: 7/24/2019 15:08:51

IPspace for the Relationship: Default

Peer Cluster Name: UKIPCLU01

Remote Intercluster Addresses: 10.0.1.150, 10.0.1.151

Availability of the Remote Cluster: Unavailable

Remote Cluster Name: UKIPCLU01

Active IP Addresses: 10.0.1.151, 10.0.1.150

Cluster Serial Number: 1-80-012657

Remote Cluster Nodes: UKIPCLU01-01, UKIPCLU01-02

Remote Cluster Health: -

Unreachable Local Nodes: -

Address Family of Relationship: ipv4

Authentication Status Administrative: use-authentication

Authentication Status Operational: ok

Last Update Time: 7/24/2019 15:08:19

IPspace for the Relationship: Default

2 entries were displayed.

UKIPCLU02::> ping -node UKIPCLU02-01 10.87.20.14

UKIPCLU02::> ping -node UKIPCLU02-01 10.87.20.15

10.87.20.15 is alive

robc · ‎2019-07-24

robc · ‎2019-07-24

just posted some screen shots of the system

SpindleNinja · ‎2019-07-24

what's "cluster show" say on each cluster.

SpindleNinja · ‎2019-07-24

I got around to opening the docx (which please post just the image next time)

They have a takeover, and from the error it sounds like the second node will not boot.

I would open a P1 asap.

AFF A220

New video on NetApp KB TV