Just throwing out an issue to the community to see if anyone else has seen this before, have two FAS systems running 8.1.2.
When attempting to issue a takover of a node (NODE-B) it fails and effectively just reboots the node being taken over as it cant release disk reservations. Lots of messages in the messages log, I have picked out some of what I think are the most relevant ones to see if anyone has seen this issue before, I have removed a lot of rows which are duplicated to make this post easier to read:
[NODE-A:8:45:50disk.reserveFailed:error]: Disk reservation failed on 3a.25.10 CDB 0x5f:0001 - SCSI:illegal request (5 55 4)
WARNING!! fmdisk_reserve_disks unable to reserve any disks.
[NODE-B:raid.assim.rg.missingChild:error]: Aggregate partner:aggr0, rgobj_verify: RAID object 3 has only 8 valid children, expected 17.
[NODE-A:raid.label.io.readError:error]: Label read on Disk 3a.23.4 Shelf 23 Bay 4 [NETAPP X306_WKf:0001 OJN02TSSM NA00] S/N [WD-WCC1P1173571] failed with storage error disk does not exist. The system will stop using the disk for I/O operations.
[NODE-Arvation:raid.assim.mirror.noChild:ALERT]: Aggregate partner:aggr0, mirrorobj_verify: No operable plexes found.
[NODE-B:raid.fm.takeoverFail:error]: RAID takeover failed: Can't find partner root volume. [NODE-A:moniar 18 0tor.globalStatus.ok:info]: This node is attempting to takeover NODE-B.
[NODE-A:cf.erveFairsrc.takeoverFail:ALERT]: Failover monitor: takeover during raid failed; takeover cancelled
[NODE-A:cf.fm.givebackStarted:notice]: Failover monitor: giveba on 3a.ck started.
[NODE-A:callhome.sfo.takeSCSI:ilover.failed:ALERT]: Call home for CONTROLLER TAKEOVER FAILED
We have run aggr scrub manually on the affected aggregate but this hasnt found an issue. It would be reaassuring to know how we could verify the RAID objects are completed and are valid and to know everything it expects is present.