ONTAP Discussions

Installing Fas2750 issue with HA

StorageAdm
307 Views

Hello Team,

Installing Fas2750 2-node cluster, after assigning the ips to both node and trying to build cluster from first node UI but its not discovering the other node its only showing 1-node.

Please let me know how to resolve this issue.

 

Log of node 1: ***** ZTL loadedPensando Offload Driver, ver 1.4.0-E-78
Pensando Ethernet NIC Driver, ver: 1.4.0-E-96
ionic_rdma ver 1.4.0-E-96 : Pensando RoCE HCA driver
***OS2SP configured successfully***Oct 14 13:49:41 [localhost:cf.ic.sbb:notice]: HA interconnect: SBB Compatibility Event. Compatible partner node found. The interconnect device has been enabled.
Oct 14 13:49:50 [localhost:discover.6500.unsupported:notice]: FC-to-SAS bridge ATTO 6500N discovery is disabled.
Oct 14 13:49:50 [localhost:fal_nvme.partition.status:notice]: Partition 0-1 with capacity 894 GiB status: online.
pnso provider init started.
pnso init failed in pnso_init() : 35
hwo: Node is using hardware provider : 1.
cryptomod_fips: Executing Crypto FIPS Self Tests.
cryptomod_fips: Crypto FIPS self-test: 'CPU COMPATIBILITY' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 ECB, AES-256 ECB' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 CBC, AES-256 CBC' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 GCM, AES-256 GCM' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 CCM' passed.
cryptomod_fips: Crypto FIPS self-test: 'CTR_DRBG' passed.
cryptomod_fips: Crypto FIPS self-test: 'KDF' passed.
cryptomod_fips: Crypto FIPS self-test: 'SHA1, SHA256, SHA512' passed.
cryptomod_fips: Crypto FIPS self-test: 'HMAC-SHA1, HMAC-SHA256, HMAC-SHA512' passed.
cryptomod_fips: Crypto FIPS self-test: 'PBKDF2' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-XTS 128, AES-XTS 256' passed.
cryptomod_fips: Crypto FIPS self-test: 'Self-integrity' passed.
Oct 14 13:49:57 [localhost:fmmb.disk.notAccsble:notice]: All Local mailbox disks are inaccessible.
Restoring parity from NVRAM
Oct 14 13:50:00 [localhost:cf.fm.notkoverClusterDisable:error]: Failover monitor: takeover disabled (restart)
Replaying WAFL log
Oct 14 13:50:00 [localhost:wafl.transition.cp.completed:notice]: Transition CP with reason replay, 00000000 for replaying=1,1 unmounting=0,0 total=2,1 volumes with a total of total=96 incoming=3 dirty buffers took 80ms with longest CP phases being CP_P2V_INO=37, CP_P2_FLUSH=17, CP_P2V_BM=13 on aggregate aggr0.
Oct 14 13:50:00 [localhost:wafl.transition.cp.completed:notice]: Transition CP with reason none, 00000000 for replaying=0,0 unmounting=0,0 total=2,1 volumes with a total of total=88 incoming=0 dirty buffers took 32ms with longest CP phases being CP_P2_FLUSH=15, CP_P2V_INO=9, CP_P3A_VOLINFO=1 on aggregate aggr0.
Oct 14 13:50:01 [localhost:kern.syslog.msg:notice]: The system was down for 825 seconds
Oct 14 13:50:01 [localhost:extCache.rw.terminated:notice]: WAFL external cache warming process terminated.
Oct 14 13:50:01 [localhost:extCache.rw.replay.canceled:notice]: WAFL external cache replay canceled for aggregate aggr0: Aggregate came online after timeout.
Oct 14 13:50:01 [localhost:cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: takeover of partner disabled (Controller Failover takeover disabled).
Oct 14 13:50:01 [localhost:kern.syslog.msg:notice]: domain xing mode: off, domain xing interrupt: false
Oct 14 13:50:01 [localhost:clam.invalid.config:error]: Local node (name=unknown, id=0) is in an invalid configuration for providing CLAM functionality. CLAM cannot determine the identity of the HA partner.
wrote key file "/tmp/rndc.key"
Oct 14 13:51:00 [localhost:monitor.globalStatus.critical:EMERGENCY]: Controller failover partner unknown. Controller failover not possible.

 

node 2: 


LOADER-B> boot_ontap
Loading X86_64/freebsd/image1/kernel:0x200000/1149120 0x319000/10815944 0xf6a000/3953872 0x132f4d0/4388088 0x200240/1016 Entry at 0xffffffff80319000
Loading X86_64/freebsd/image1/platform.ko:0x175f000/4068960 0x1b40660/628568
Starting program at 0xffffffff80319000
---<<BOOT>>---
NetApp Data ONTAP 9.9.1P7
IPMI device unit 0 rev. 1, firmware rev. 10.00, version 2.0, device support mask 0xbf
IPMI device unit 1 rev. 1, firmware rev. 10.00, version 2.0, device support mask 0xbf
Copyright (C) 1992-2022 NetApp.
All rights reserved.
*******************************
* *
* Press Ctrl-C for Boot Menu. *
* *
*******************************
cryptomod_fips: Executing Crypto FIPS Self Tests.
cryptomod_fips: Crypto FIPS self-test: 'CPU COMPATIBILITY' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 ECB, AES-256 ECB' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 CBC, AES-256 CBC' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 GCM, AES-256 GCM' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 CCM' passed.
cryptomod_fips: Crypto FIPS self-test: 'CTR_DRBG' passed.
cryptomod_fips: Crypto FIPS self-test: 'KDF' passed.
cryptomod_fips: Crypto FIPS self-test: 'SHA1, SHA256, SHA512' passed.
cryptomod_fips: Crypto FIPS self-test: 'HMAC-SHA1, HMAC-SHA256, HMAC-SHA512' passed.
cryptomod_fips: Crypto FIPS self-test: 'PBKDF2' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-XTS 128, AES-XTS 256' passed.
cryptomod_fips: Crypto FIPS self-test: 'Self-integrity' passed.
Fri Oct 14 13:59:54 2022 sp_get_oem_nv2f_event:Response (6 bytes) evt 3 timestamp 0x0
Fri Oct 14 13:59:54 2022 sp_clear_oem_nv2f_event:cleared
Fri Oct 14 13:59:54 2022 [nv2flash.restage.progress:NOTICE]: ReStage is going to restore non-volatile data from flash in approximately 21 seconds.
.....................
Fri Oct 14 14:00:08 2022 Going to clear pending data on mSata
Fri Oct 14 14:00:08 2022 [nv2flash.copy2NVMEM.succeed:INFO]:Copying nonvolatile data from the flash device to NVMEM succeeded in 14 seconds.
Attempting to use existing varfs on /dev/nvrd1


SUCCESS
Oct 14 14:00:47 Power outage protection flash de-staging: 31 cycles
***** ZTL loadedPensando Offload Driver, ver 1.4.0-E-78
Pensando Ethernet NIC Driver, ver: 1.4.0-E-96
ionic_rdma ver 1.4.0-E-96 : Pensando RoCE HCA driver
***OS2SP configured successfully***Oct 14 14:01:47 [localhost:cf.ic.sbb:notice]: HA interconnect: SBB Compatibility Event. Compatible partner node found. The interconnect device has been enabled.
Oct 14 14:01:54 [localhost:discover.6500.unsupported:notice]: FC-to-SAS bridge ATTO 6500N discovery is disabled.
Oct 14 14:01:54 [localhost:fal_nvme.partition.status:notice]: Partition 0-1 with capacity 894 GiB status: online.
pnso provider init started.
pnso init failed in pnso_init() : 35
hwo: Node is using hardware provider : 1.
cryptomod_fips: Executing Crypto FIPS Self Tests.
cryptomod_fips: Crypto FIPS self-test: 'CPU COMPATIBILITY' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 ECB, AES-256 ECB' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 CBC, AES-256 CBC' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 GCM, AES-256 GCM' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 CCM' passed.
cryptomod_fips: Crypto FIPS self-test: 'CTR_DRBG' passed.
cryptomod_fips: Crypto FIPS self-test: 'KDF' passed.
cryptomod_fips: Crypto FIPS self-test: 'SHA1, SHA256, SHA512' passed.
cryptomod_fips: Crypto FIPS self-test: 'HMAC-SHA1, HMAC-SHA256, HMAC-SHA512' passed.
cryptomod_fips: Crypto FIPS self-test: 'PBKDF2' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-XTS 128, AES-XTS 256' passed.
cryptomod_fips: Crypto FIPS self-test: 'Self-integrity' passed.
Oct 14 14:02:01 [localhost:fmmb.disk.notAccsble:notice]: All Local mailbox disks are inaccessible.
Restoring parity from NVRAM
Oct 14 14:02:04 [localhost:cf.fm.notkoverClusterDisable:error]: Failover monitor: takeover disabled (restart)
Replaying WAFL log
Oct 14 14:02:05 [localhost:wafl.transition.cp.completed:notice]: Transition CP with reason replay, 00000000 for replaying=1,1 unmounting=0,0 total=2,1 volumes with a total of total=119 incoming=21 dirty buffers took 81ms with longest CP phases being CP_P1_CLEAN=43, CP_P2_FLUSH=24, CP_P3A_VOLINFO=7 on aggregate aggr0.
Oct 14 14:02:05 [localhost:wafl.transition.cp.completed:notice]: Transition CP with reason none, 00000000 for replaying=0,0 unmounting=0,0 total=2,1 volumes with a total of total=87 incoming=0 dirty buffers took 21ms with longest CP phases being CP_P2_FLUSH=9, CP_P2V_INO=5, CP_P3A_VOLINFO=1 on aggregate aggr0.
Oct 14 14:02:05 [localhost:kern.syslog.msg:notice]: The system was down for 264 seconds
Oct 14 14:02:05 [localhost:extCache.rw.terminated:notice]: WAFL external cache warming process terminated.
Oct 14 14:02:05 [localhost:extCache.rw.replay.canceled:notice]: WAFL external cache replay canceled for aggregate aggr0: Aggregate came online after timeout.
Oct 14 14:02:05 [localhost:cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: takeover of partner disabled (Controller Failover takeover disabled).
Oct 14 14:02:05 [localhost:kern.syslog.msg:notice]: domain xing mode: off, domain xing interrupt: false
Oct 14 14:02:05 [localhost:clam.invalid.config:error]: Local node (name=unknown, id=0) is in an invalid configuration for providing CLAM functionality. CLAM cannot determine the identity of the HA partner.
wrote key file "/tmp/rndc.key"
Oct 14 14:03:00 [localhost:monitor.globalStatus.critical:EMERGENCY]: Controller failover partner unknown. Controller failover not possible.

StorageAdm_0-1665740083988.png

 

Thank you,

 

3 REPLIES 3

paul_stejskal
286 Views

Please open a case. I suspect there could be a HW issue, but I'm not sure and this requires more troubleshooting.

 

Perhaps reseating the controllers in the chassis might be in order?

Dobermanj
178 Views

Hello, how did your case end?
I have the same problem.

Geo_19
74 Views

I agree with @paulc, have you tried reseating the node? leave it out around 1 m o 2 , then put it back again, nd boot it

Public