Solved: Downgrade or revert new AFF A300 from 9.6 to 9.5

kelwin · ‎2019-11-23

We have 2 new AFF A300 system that came preinstalled with OnTap 9.6p2. Our exisiting clusters of FAS8040 are running 9.5p5. We'd like to downgrade A300s to Ontap 9.5. What are proper procedures for downgrading nodes? Our A300s are connnected to cluster switches (not part of cluster) but should we connect A300 nodes together when downgrading OnTap? Our goal is to downgrade A300s to OnTap 9.5 then initalize with ADPv2.

kelwin · ‎2019-12-03

FYI to those interested, we contacted support and after some digging they fouind a stale smf table entry blocking new nodes trying to joing the cluster.

Although Nodes 3 and 4 had the same version of ONTAP as the existing cluster they would not join because there was a stale entry in the smf tables.

xxxxxxx::*> debug smdb table cluster_version_replicated show
uuid generation major minor version-string date ontapi-major ontapi-minor is-image-same state
------------------------------------ ---------- ----- ----- -------------------------------------------------- ------------------------ ------------ ------------ ------------- -----
3101b6df-7cec-11e5-8e37-00a0985f3fc6 8 3 0 NetApp Release 8.3P1: Tue Apr 07 16:05:35 PDT 2015 Tue Apr 07 12:05:35 2015 1 30 true none
57c64277-7cec-11e5-8e37-00a0985f3fc6 9 5 0 NetApp Release 9.5P5: Fri Jun 14 15:33:34 UTC 2019 Fri Jun 14 11:33:34 2019 1 150 true none
833939e5-7cd5-11e5-b363-396932647d67 9 5 0 NetApp Release 9.5P5: Fri Jun 14 15:33:34 UTC 2019 Fri Jun 14 11:33:34 2019 1 150 true none
f4270dd4-7cd3-11e5-a735-a570cc7c464a 9 5 0 NetApp Release 9.5P5: Fri Jun 14 15:33:34 UTC 2019 Fri Jun 14 11:33:34 2019 1 150 true none
4 entries were displayed.

There should be an entry for each node in the Cluster plus 1 entry for the Cluster.

We removed the 8.3P1 entry:

::*> debug smdb table cluster_version_replicated delete -uuid 3101b6df-7cec-11e5-8e37-00a0985f3fc6

Note: Do not edit the smf tables without guidance from NetApp Support.

View solution in original post

aborzenkov · ‎2019-11-23

If these are new systems, it is faster to simply install another version. Use special boot menu 9a to remove existing partitions, then 7 to install desired version (needs HTTP server to download ONTAP from) and then 4 or 9b to initialize. After that join nodes to existing cluster.

TMACMD · ‎2019-11-26

You could do this:

1. Boot to Special Boot Menu

2. Option #7

3. Choose your interface (like e0M): TMAC's Hack: DO NOT say yes to reboot. Choose N

4. Option #7 (NEED a web server)

5. e0M is now selected (without a reboot)

6. Enter info (IP of interface, netmask, gateway if needed)

7. Enter URL of ONTAP package (same package you use to upgrade ONTAP)

8. Let it install and do its' thing.

9. During reboot, be sure to catch the Special Boot Menu

10. Choose Option 9

11. Get BOTH controllers to this point

12. Choose Option 9a on Node 1. Let it finish

13. Choose Option 9a on Node 2. Let it finish

14. Choose Option 9a on Node 1. Let it finish (you should see all drives listed)

15. Choose Option 9a on Node 2. Let it finish (you should see all drives listed)

16. Choose Option 9b on Node 1. Let it reboot, partition and start up ONTAP cluster setup.

17. Choose Option 9b on Node 2. It will reboot, partition and get to the ONTAP cluster Setup.

kelwin · ‎2019-11-26

We've downgraded successfully but now we cannot join the cluster because of 2 different OnTap images. Current cluster is image2 9.4P4 image1 9.5P5, new A300 nodes are image2 9.6p2 image1 9.5P5. Are we able to set both images the same across all nodes in the cluster wihout causing issues.

aborzenkov · ‎2019-11-26

I doubt very much the problem is in different images. What version is currently active on AFF? Please, show actual console protocol of join attempt including error you get.

kelwin · ‎2019-11-27

Enter the IP address of an interface on the private cluster network from the

cluster you want to join: xxx.xxx.xxx.xxx

Joining cluster at address xxx.xxx.xxx.xxx

System checks .Error: Cluster join operation cannot be performed at this time: All nodes in cluster must be at the same ONTAP version before node can be joined.

Resolve the issue, then try the command again.

Restarting Cluster Setup

CURRENT CLUSTER

Last login time: 11/26/2019 11:04:53

cluster1::> version

NetApp Release 9.5P5: Fri Jun 14 15:33:34 UTC 2019

A300 NODE A

::> version

NetApp Release 9.5P5: Fri Jun 14 15:33:34 UTC 2019

Notice: Showing the version for the local node; the cluster-wide version could

not be determined.

TMACMD · ‎2019-11-27

On the cluster, what does this show:

"system image show"

How about showing the full output of the serial session up to the failure on the A300?

Maybe there is something that was missed?

How did you "downgrade" to 9.5 on the A300s?

If you did not wipe(like I indicated earlier), that may be causing issues.

kelwin · ‎2019-11-27

Wiped per instructions.

cluster1::*> system image show
Is Is Install
Node Image Default Current Version Date
-------- ------- ------- ------- ------------------------- -------------------
node-01
image1 true true 9.5P5 7/9/2019 21:08:16
image2 false false 9.5P4 6/12/2019 01:43:04
node-02
image1 true true 9.5P5 7/9/2019 21:08:24
image2 false false 9.5P4 6/12/2019 01:43:14
4 entries were displayed.

====================================================

Selection (9a-9e)?: 9b
9b

########## WARNING ##########

All configuration data will be deleted and the node will be
initialized with partitioned disks. Existing disk partitions must
be removed from all disks (9a) attached to this node and
its HA partner (and DR/DR-AUX partner nodes if applicable).
The HA partner (and DR/DR-AUX partner nodes if applicable) must
be waiting at the boot menu or already initialized with partitioned
disks (9b).
Do you still want to continue (yes/no)? yes
yes
AdpInit: This system will now reboot to perform wipeclean.
bootarg.bootmenu.selection is |wipeconfig|
Nov 27 10:29:46 [localhost:diskown.errorReadingOwnership:notice]: error 16 (disk does no t exist) while reading ownership on disk 0a.00.21 (S/N S3SGNA0M801601)
Nov 27 10:29:46 [localhost:diskown.errorDuringIO:error]: error 16 (disk does not exist) on disk 0a.00.23 (S/N S3SGNA0M801789) while reading individual disk ownership area
Nov 27 10:29:46 [localhost:diskown.errorReadingOwnership:notice]: error 16 (disk does no t exist) while reading ownership on disk 0a.00.19 (S/N S3SGNA0M704669)
Nov 27 10:29:46 [localhost:diskown.errorReadingOwnership:notice]: error 16 (disk does no t exist) while reading ownership on disk 0d.00.22 (S/N S3SGNA0M802615)
Nov 27 10:29:46 [localhost:diskown.errorDuringIO:error]: error 16 (disk does not exist) on disk 0a.00.19 (S/N S3SGNA0M704669) while reading individual disk ownership area
Nov 27 10:29:46 [localhost:diskown.errorDuringIO:error]: error 16 (disk does not exist) on disk 0d.00.22 (S/N S3SGNA0M802615) while reading individual disk ownership area
Nov 27 10:29:46 [localhost:diskown.errorReadingOwnership:notice]: error 16 (disk does no t exist) while reading ownership on disk 0d.00.20 (S/N S3SGNA0M802540)
Nov 27 10:29:46 [localhost:diskown.errorDuringIO:error]: error 16 (disk does not exist) on disk 0d.00.20 (S/N S3SGNA0M802540) while reading individual disk ownership area
Nov 27 10:29:46 [localhost:diskown.errorReadingOwnership:notice]: error 16 (disk does no t exist) while reading ownership on disk 0d.00.18 (S/N S3SGNA0M704671)
Nov 27 10:29:46 [localhost:diskown.errorDuringIO:error]: error 16 (disk does not exist) on disk 0d.00.18 (S/N S3SGNA0M704671) while reading individual disk ownership area
.
Terminated
Skipped backing up /var file system to boot device.
Uptime: 15m0s
System rebooting...
BIOS Version: 11.5
Portions Copyright (C) 2014-2018 NetApp, Inc. All Rights Reserved.

Initializing System Memory ...
Loading Device Drivers ...
Configuring Devices ...

CPU = 1 Processor(s) Detected.
Intel(R) Xeon(R) CPU D-1587 @ 1.70GHz (CPU 0)
CPUID: 0x00050664. Cores per Processor = 16
131072 MB System RAM Installed.
SATA (AHCI) Device: ATP SATA III mSATA AF120GSMHI-NT2

Starting AUTOBOOT press Ctrl-C to abort...
Loading X86_64/freebsd/image2/kernel:0x200000/15719336 0x10fdba8/13881768 Entry at 0xfff fffff802dc4b0
Loading X86_64/freebsd/image2/platform.ko:0x1e3b000/4076848 0x221e530/586840
Starting program at 0xffffffff802dc4b0
NetApp Data ONTAP 9.5P5
IPsec: Initialized Security Association Processing.
Copyright (C) 1992-2019 NetApp.
All rights reserved.
*******************************
* *
* Press Ctrl-C for Boot Menu. *
* *
*******************************
cryptomod_fips: Executing Crypto FIPS Self Tests.
cryptomod_fips: Crypto FIPS self-test: 'CPU COMPATIBILITY' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 ECB, AES-256 ECB' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 CBC, AES-256 CBC' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 GCM, AES-256 GCM' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 CCM' passed.
cryptomod_fips: Crypto FIPS self-test: 'CTR_DRBG' passed.
cryptomod_fips: Crypto FIPS self-test: 'SHA1, SHA256, SHA512' passed.
cryptomod_fips: Crypto FIPS self-test: 'HMAC-SHA1, HMAC-SHA256, HMAC-SHA512' passed.
cryptomod_fips: Crypto FIPS self-test: 'PBKDF2' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-XTS 128, AES-XTS 256' passed.
cryptomod_fips: Crypto FIPS self-test: 'Self-integrity' passed.
Wed Nov 27 10:30:35 2019 [nv2flash.restage.progress:NOTICE]: ReStage is not needed becau se the flash has no data.
Wipe filer procedure requested.

Nov 27 10:30:53 Power outage protection flash de-staging: 19 cycles
***OS2SP configured successfully***
sk_allocate_memory: large allocation, bzero 7782 MB in 987 ms
cryptomod_fips: Executing Crypto FIPS Self Tests.
cryptomod_fips: Crypto FIPS self-test: 'CPU COMPATIBILITY' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 ECB, AES-256 ECB' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 CBC, AES-256 CBC' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 GCM, AES-256 GCM' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 CCM' passed.
cryptomod_fips: Crypto FIPS self-test: 'CTR_DRBG' passed.
cryptomod_fips: Crypto FIPS self-test: 'SHA1, SHA256, SHA512' passed.
cryptomod_fips: Crypto FIPS self-test: 'HMAC-SHA1, HMAC-SHA256, HMAC-SHA512' passed.
cryptomod_fips: Crypto FIPS self-test: 'PBKDF2' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-XTS 128, AES-XTS 256' passed.
cryptomod_fips: Crypto FIPS self-test: 'Self-integrity' passed.
9b
AdpInit: Root will be created with 6 disks with configuration as (3d+2p+1s) using disks of type (SSD).
bootarg.bootmenu.selection is |4a|
AdpInit: System will now perform initialization using option 4a
BOOTMGR: The system has 0 disks assigned whereas it needs 6 to boot, will try to assign the required number.
sanown_assign_X_disks: init boot assign with half shelf policy
Nov 27 10:31:53 [localhost:diskown.hlfShlf.assignStatus:notice]: Half shelf based automa tic disk assignment is "enabled".
sanown_split_shelf_lock_disk_op: msg success op: RESERVE lock disk: 5002538B:09759850:00 0000:00000000:00000000:00000000 status: 0
sanown_split_shelf_lock_disk_op: msg success op: RELEASE lock disk: 5002538B:09759850:00 000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000 status: 0
sanown_dump_split_shelf_info: Time: 30502 Shelf count:1
sanown_dump_split_shelf_info: Shelf: 0 is_local: 1 is_internal: 0 flags 2c max_slot: 24 type: 0
sanown_dump_split_shelf_info: Shelf: 0 section: 0 owner_id: 538117741 state: 1
sanown_dump_split_shelf_info: Shelf: 0 section: 1 owner_id: 538117925 state: 1
sanown_dump_split_shelf_info: Shelf: 0 Lock index: 0 Lock valid: 1 Lock slot: 0 Lock dis k: 5002538B:09759850:00000000:00000000:00000000:00000000:00000000:00000000:00000000:0000 0000
sanown_dump_split_shelf_info: Shelf: 0 Lock index: 1 Lock valid: 1 Lock slot: 1 Lock dis k: 5002538B:0983CB20:00000000:00000000:00000000:00000000:00000000:00000000:00000000:0000 0000
sanown_assign_X_disks: assign disks from my unowned local site pool0 loop
sanown_assign_disk_helper: Assigned disk 0a.00.2
Cannot do remote rescan. Use 'run local disk show' on the console of ?? for it to scan t he newly assigned disks
sanown_assign_disk_helper: Assigned disk 0a.00.4
Nov 27 10:31:53 [localhost:diskown.RescanMessageFailed:error]: Could not send rescan mes sage to ??.
sanown_assign_disk_helper: Assigned disk 0d.00.1
sanown_assign_disk_helper: Assigned disk 0a.00.0
sanown_assign_disk_helper: Assigned disk 0d.00.3
sanown_assign_disk_helper: Assigned disk 0d.00.5
BOOTMGR: already_assigned=0, min_to_boot=6, num_assigned=6

Nov 27 10:31:53 [localhost:raid.disk.fast.zero.done:notice]: Disk 0a.00.4 Shelf 0 Bay 4 [NETAPP X357_S16433T8ATE NA53] S/N [S3SGNA0M801738] UID [5002538B:098358D0:00000000:00 000000:00000000:00000000:00000000:00000000:00000000:00000000] : disk zeroing complete (0 x5dde50996f7252d0).
Nov 27 10:31:53 [localhost:raid.disk.fast.zero.done:notice]: Disk 0a.00.2 Shelf 0 Bay 2 [NETAPP X357_S16433T8ATE NA53] S/N [S3SGNA0M801787] UID [5002538B:09835BE0:00000000:00 000000:00000000:00000000:00000000:00000000:00000000:00000000] : disk zeroing complete (0 x5dde50993a9b0ed9).
Nov 27 10:31:53 [localhost:raid.disk.fast.zero.done:notice]: Disk 0d.00.5 Shelf 0 Bay 5 [NETAPP X357_S16433T8ATE NA53] S/N [S3SGNA0M802536] UID [5002538B:0983C540:00000000:00 000000:00000000:00000000:00000000:00000000:00000000:00000000] : disk zeroing complete (0 x5dde509919ebe69e).
Nov 27 10:31:53 [localhost:raid.disk.fast.zero.done:notice]: Disk 0d.00.3 Shelf 0 Bay 3 [NETAPP X357_S16433T8ATE NA53] S/N [S3SGNA0M802516] UID [5002538B:0983C400:00000000:00 000000:00000000:00000000:00000000:00000000:00000000:00000000] : disk zeroing complete (0 x5dde50994e719c5d).
Nov 27 10:31:53 [localhost:raid.disk.fast.zero.done:notice]: Disk 0a.00.0 Shelf 0 Bay 0 [NETAPP X357_S16433T8ATE NA53] S/N [S3SGNA0M704674] UID [5002538B:09759850:00000000:00 000000:00000000:00000000:00000000:00000000:00000000:00000000] : disk zeroing complete (0 x5dde509904d0c5e7).
Nov 27 10:31:53 [localhost:raid.disk.fast.zero.done:notice]: Disk 0d.00.1 Shelf 0 Bay 1 [NETAPP X357_S16433T8ATE NA53] S/N [S3SGNA0M802630] UID [5002538B:0983CB20:00000000:00 000000:00000000:00000000:00000000:00000000:00000000:00000000] : disk zeroing complete (0 x5dde50992670c329).
Nov 27 10:31:54 [localhost:raid.autoPart.start:notice]: System has started auto-partitio ning 6 disks.
Nov 27 10:31:55 [localhost:raid.partition.disk:notice]: Disk partition successful on Dis k 0a.00.0 Shelf 0 Bay 0 [NETAPP X357_S16433T8ATE NA53] S/N [S3SGNA0M704674] UID [50025 38B:09759850:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000], p artitions created 3, partition sizes specified 1, partition spec summary [3]=37660224.
Nov 27 10:31:56 [localhost:raid.partition.disk:notice]: Disk partition successful on Dis k 0d.00.1 Shelf 0 Bay 1 [NETAPP X357_S16433T8ATE NA53] S/N [S3SGNA0M802630] UID [50025 38B:0983CB20:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000], p artitions created 3, partition sizes specified 1, partition spec summary [3]=37660224.
Nov 27 10:31:58 [localhost:raid.partition.disk:notice]: Disk partition successful on Dis k 0a.00.2 Shelf 0 Bay 2 [NETAPP X357_S16433T8ATE NA53] S/N [S3SGNA0M801787] UID [50025 38B:09835BE0:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000], p artitions created 3, partition sizes specified 1, partition spec summary [3]=37660224.
Nov 27 10:31:59 [localhost:raid.partition.disk:notice]: Disk partition successful on Dis k 0d.00.3 Shelf 0 Bay 3 [NETAPP X357_S16433T8ATE NA53] S/N [S3SGNA0M802516] UID [50025 38B:0983C400:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000], p artitions created 3, partition sizes specified 1, partition spec summary [3]=37660224.
Nov 27 10:32:01 [localhost:raid.partition.disk:notice]: Disk partition successful on Dis k 0a.00.4 Shelf 0 Bay 4 [NETAPP X357_S16433T8ATE NA53] S/N [S3SGNA0M801738] UID [50025 38B:098358D0:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000], p artitions created 3, partition sizes specified 1, partition spec summary [3]=37660224.
Nov 27 10:32:02 [localhost:raid.partition.disk:notice]: Disk partition successful on Dis k 0d.00.5 Shelf 0 Bay 5 [NETAPP X357_S16433T8ATE NA53] S/N [S3SGNA0M802536] UID [50025 38B:0983C540:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000], p artitions created 3, partition sizes specified 1, partition spec summary [3]=37660224.
Nov 27 10:32:02 [localhost:raid.autoPart.done:notice]: Successfully auto-partitioned 6 o f 6 disks.
Nov 27 10:32:02 [localhost:raid.vol.disk.add.done:notice]: Addition of Disk /aggr0/plex0 /rg0/0a.00.4P3 Shelf 0 Bay 4 [NETAPP X357_S16433T8ATE NA53] S/N [S3SGNA0M801738NP003] UID [6002538B:098358D0:500A0981:00000003:00000000:00000000:00000000:00000000:00000000:00 000000] to aggregate aggr0 has completed successfully
Nov 27 10:32:02 [localhost:raid.vol.disk.add.done:notice]: Addition of Disk /aggr0/plex0 /rg0/0d.00.3P3 Shelf 0 Bay 3 [NETAPP X357_S16433T8ATE NA53] S/N [S3SGNA0M802516NP003] UID [6002538B:0983C400:500A0981:00000003:00000000:00000000:00000000:00000000:00000000:00 000000] to aggregate aggr0 has completed successfully
Nov 27 10:32:02 [localhost:raid.vol.disk.add.done:notice]: Addition of Disk /aggr0/plex0 /rg0/0a.00.2P3 Shelf 0 Bay 2 [NETAPP X357_S16433T8ATE NA53] S/N [S3SGNA0M801787NP003] UID [6002538B:09835BE0:500A0981:00000003:00000000:00000000:00000000:00000000:00000000:00 000000] to aggregate aggr0 has completed successfully
Nov 27 10:32:02 [localhost:raid.vol.disk.add.done:notice]: Addition of Disk /aggr0/plex0 /rg0/0d.00.1P3 Shelf 0 Bay 1 [NETAPP X357_S16433T8ATE NA53] S/N [S3SGNA0M802630NP003] UID [6002538B:0983CB20:500A0981:00000003:00000000:00000000:00000000:00000000:00000000:00 000000] to aggregate aggr0 has completed successfully
Nov 27 10:32:02 [localhost:raid.vol.disk.add.done:notice]: Addition of Disk /aggr0/plex0 /rg0/0a.00.0P3 Shelf 0 Bay 0 [NETAPP X357_S16433T8ATE NA53] S/N [S3SGNA0M704674NP003] UID [6002538B:09759850:500A0981:00000003:00000000:00000000:00000000:00000000:00000000:00 000000] to aggregate aggr0 has completed successfully
Nov 27 10:32:02 [localhost:wafl.data.compaction.event:notice]: WAFL volume data compacti on state changed in aggregate "aggr0" to "enabled".
Nov 27 10:32:03 [localhost:wafl.transition.cp.completed:notice]: Transition CP with reas on none, 00000000 for replaying=0,0 unmounting=0,0 total=1,0 volumes with a total of tot al=35 incoming=10 dirty buffers took 12ms with longest CP phases being CP_P2_FLUSH=7, CP _P1_CLEAN=1, CP_PRE_P0=1 on aggregate aggr0.
Nov 27 10:32:03 [localhost:wafl.transition.cp.completed:notice]: Transition CP with reas on none, 00000000 for replaying=0,0 unmounting=0,0 total=1,0 volumes with a total of tot al=29 incoming=0 dirty buffers took 14ms with longest CP phases being CP_P2_FLUSH=11, CP _P5_FINISH=0, CP_P4_FINISH=0 on aggregate aggr0.
Nov 27 10:32:03 [localhost:wafl.transition.cp.completed:notice]: Transition CP with reas on none, 00000000 for replaying=0,0 unmounting=0,0 total=1,0 volumes with a total of tot al=51 incoming=10 dirty buffers took 15ms with longest CP phases being CP_P2_FLUSH=10, C P_P1_CLEAN=1, CP_PRE_P0=1 on aggregate aggr0.
Nov 27 10:32:03 [localhost:wafl.transition.cp.completed:notice]: Transition CP with reas on none, 00000000 for replaying=0,0 unmounting=0,0 total=2,1 volumes with a total of tot al=105 incoming=21 dirty buffers took 15ms with longest CP phases being CP_P2_FLUSH=5, C P_PRE_P0=2, CP_P3A_VOLINFO=1 on aggregate aggr0.
Nov 27 10:32:03 [localhost:wafl.transition.cp.completed:notice]: Transition CP with reas on none, 00000000 for replaying=0,0 unmounting=0,0 total=2,1 volumes with a total of tot al=90 incoming=5 dirty buffers took 20ms with longest CP phases being CP_P2_FLUSH=15, CP _P5_FINISH=0, CP_P4_FINISH=0 on aggregate aggr0.
Nov 27 10:32:03 [localhost:wafl.transition.cp.completed:notice]: Transition CP with reas on none, 00000000 for replaying=0,0 unmounting=0,0 total=2,1 volumes with a total of tot al=89 incoming=3 dirty buffers took 8ms with longest CP phases being CP_P2_FLUSH=4, CP_P 5_FINISH=0, CP_P4_FINISH=0 on aggregate aggr0.
Nov 27 10:32:03 [localhost:cf.fm.notkoverClusterDisable:error]: Failover monitor: takeov er disabled (restart)
Nov 27 10:32:03 [localhost:tar.csum.notFound:notice]: Stored checksum file does not exis t, extracting local://mnt/prestage/mroot.tgz.
Nov 27 10:32:03 [localhost:tar.csum.mismatch:notice]: Stored checksum 0 does not match c alculated checksum 3629085172, extracting local://mnt/prestage/mroot.tgz.
Nov 27 10:32:03 [localhost:cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: ta keover of partner disabled (Controller Failover takeover disabled).
Nov 27 10:32:04 [localhost:wafl.transition.cp.completed:notice]: Transition CP with reas on none, 00000000 for replaying=0,0 unmounting=0,0 total=2,1 volumes with a total of tot al=2527 incoming=2382 dirty buffers took 37ms with longest CP phases being CP_P1_CLEAN=1 5, CP_P2_FLUSH=3, CP_P2V_INO=2 on aggregate aggr0.
Nov 27 10:32:05 [localhost:tar.csum.notFound:notice]: Stored checksum file does not exis t, extracting local://mnt/prestage/pmroot.tgz.
Nov 27 10:32:05 [localhost:tar.csum.mismatch:notice]: Stored checksum 0 does not match c alculated checksum 1569079177, extracting local://mnt/prestage/pmroot.tgz.
Nov 27 10:32:06 [localhost:kern.syslog.msg:notice]: Registry is being upgraded to improv e storing of local changes.
Nov 27 10:32:06 [localhost:kern.syslog.msg:notice]: Registry upgrade successful.
Nov 27 10:32:06 [localhost:kern.syslog.msg:notice]: domain xing mode: off, domain xing i nterrupt: false
Nov 27 10:32:06 [localhost:clam.invalid.config:error]: Local node (name=unknown, id=0) i s in an invalid configuration for providing CLAM functionality. CLAM cannot determine th e identity of the HA partner.
Kernel thread "perfmon poller thre" (pid 4711) exited prematurely.
System initialization has completed successfully.
Nov 27 10:32:07 [localhost:scsitarget.hwpfct.linkUp:notice]: Link up on Fibre Channel ta rget adapter 1b.
Nov 27 10:32:07 [localhost:scsitarget.hwpfct.linkUp:notice]: Link up on Fibre Channel ta rget adapter 1a.
Nov 27 10:32:07 [localhost:scsitarget.hwpfct.linkUp:notice]: Link up on Fibre Channel ta rget adapter 1d.
Nov 27 10:32:07 [localhost:scsitarget.hwpfct.linkUp:notice]: Link up on Fibre Channel ta rget adapter 1c.
Occupied cpu socket mask is 0x1
wrote key file "/tmp/rndc.key"
Nov 27 10:33:00 [localhost:monitor.globalStatus.critical:EMERGENCY]: Controller failover partner unknown. Controller failover not possible.

Welcome to the cluster setup wizard.

You can enter the following commands at any time:
"help" or "?" - if you want to have a question clarified,
"back" - if you want to change previously answered questions, and
"exit" or "quit" - if you want to quit the cluster setup wizard.
Any changes you made before quitting will be saved.

You can return to cluster setup at any time by typing "cluster setup".
To accept a default or omit a question, do not enter a value.

This system will send event messages and periodic reports to NetApp Technical
Support. To disable this feature, enter
autosupport modify -support disable
within 24 hours.

Enabling AutoSupport can significantly speed problem determination and
resolution should a problem occur on your system.
For further information on AutoSupport, see:
http://support.netapp.com/autosupport/

Type yes to confirm and continue {yes}: yes

Enter the node management interface port [e0M]:
Enter the node management interface IP address: xxx.xxx.xxx.xxx

Enter the node management interface netmask: xxx.xxx.xxx.xxx
Enter the node management interface default gateway: xxx.xxx.xxx.xxx
A node management interface on port e0M with IP address xxx.xxx.xxx.xxx has been created.

Use your web browser to complete cluster setup by accessing https://xxx.xxx.xxx.xxx

Otherwise, press Enter to complete cluster setup using the command line
interface:
Exiting the cluster setup wizard. Any changes you made have been saved.

The cluster administrator's account (username "admin") password is set to the system def ault.

Warning: You have exited the cluster setup wizard before completing all
of the tasks. The cluster is not configured. You can complete cluster setup by typing
"cluster setup" in the command line interface.

Wed Nov 27 10:38:54 UTC 2019
login: admin
******************************************************
* This is a serial console session. Output from this *
* session is mirrored on the SP console session. *
******************************************************
::> hostname
localhost

::> exit
Goodbye

login:

TMACMD · ‎2019-11-27

Thats a great start.

what about the rest?

Run the cluster setup from the CLI and show that output also.

I am wondering if the cluster is not communicating properly.

From the cluster (these are all auto-assigned 169 addresses, no real need to mask)

net port show -ipspace Cluster

net int show -vserver Cluster

net device-discovery show -ipspace Cluster

From the A300s CLI:

net port show -port e0a|e0b

net device-discovery show -port e0a|e0b

net i

kelwin · ‎2019-12-03

FYI to those interested, we contacted support and after some digging they fouind a stale smf table entry blocking new nodes trying to joing the cluster.

Although Nodes 3 and 4 had the same version of ONTAP as the existing cluster they would not join because there was a stale entry in the smf tables.

xxxxxxx::*> debug smdb table cluster_version_replicated show
uuid generation major minor version-string date ontapi-major ontapi-minor is-image-same state
------------------------------------ ---------- ----- ----- -------------------------------------------------- ------------------------ ------------ ------------ ------------- -----
3101b6df-7cec-11e5-8e37-00a0985f3fc6 8 3 0 NetApp Release 8.3P1: Tue Apr 07 16:05:35 PDT 2015 Tue Apr 07 12:05:35 2015 1 30 true none
57c64277-7cec-11e5-8e37-00a0985f3fc6 9 5 0 NetApp Release 9.5P5: Fri Jun 14 15:33:34 UTC 2019 Fri Jun 14 11:33:34 2019 1 150 true none
833939e5-7cd5-11e5-b363-396932647d67 9 5 0 NetApp Release 9.5P5: Fri Jun 14 15:33:34 UTC 2019 Fri Jun 14 11:33:34 2019 1 150 true none
f4270dd4-7cd3-11e5-a735-a570cc7c464a 9 5 0 NetApp Release 9.5P5: Fri Jun 14 15:33:34 UTC 2019 Fri Jun 14 11:33:34 2019 1 150 true none
4 entries were displayed.

There should be an entry for each node in the Cluster plus 1 entry for the Cluster.

We removed the 8.3P1 entry:

::*> debug smdb table cluster_version_replicated delete -uuid 3101b6df-7cec-11e5-8e37-00a0985f3fc6

Note: Do not edit the smf tables without guidance from NetApp Support.

EWILTS_SAS · ‎2019-11-26

Although you can downgrade your new 300 to 9.5, my recommendation would be to upgrade your 8040 to 9.6 which is already up to P4. There are over 10,000 filers in the field running 9.6 and it's a stable and supported release (even-numbered releases used to be supported for only 1 year but that changed so it's now the same as odd-numbered releases).