Really hoping someone can help here as I am very new to FAS
One of our arrays had a disk fail, so we replaced this last week, assigned ownership and assumed rebuild started, but since this point:
1. We cannot CLI to the filer (1A) hosting this LUN - it asks for credentials and then closes once we enter these
2. We now cannot GUI into the clustered pair - previously this was working partially - could not access disks or aggregates for the filer with the dead disk - now it sits there "authenticating to filer 1A" and never completes
3. Snapmirror between this filer (1A) and a partner unit (2A) has stopped
The disk that eventually died was in rebuild back in October - the filer started a repair of the disk and ground the systems to a halt - we lost CLI access then as well, clients very much noticed the rebuild process as their data was running slowly, but it restored itself eventually and since then we havent had any reports.
HA mode apparently has been offline since that point though which we were not alerted on.
1A Message: HA mode, but takeover of partner is disabled due to reason : status of backup mailbox is uncertain.
CLI from partner filer: 1B> cf monitor current time: 27Dec2017 09:54:56 UP 68+07:35:13, partner '1A', CF monitor enabled VIA Interconnect is up (link up), takeover capability on-line partner may be down, last partner update TAKEOVER_ENABLED (20Oct2017 22:59:23) takeover scheduled 00:00:15
1B> cf status 1A may be down, takeover will be initiated in 15 seconds. VIA Interconnect is up (link up).
1B> cf hw_assist status Local Node(1B) Status: Active: 1B monitoring alerts from partner(1A) port 4444 IP address 192.168.1.15 Partner Node(1A) Status: Active: 1A monitoring alerts from partner(1B) port 4444 IP address 192.168.1.14
I am not sure where to go from this point being this is our first time managing a FAS unit but I am almost at the point of moving all the data from this LUN to protect our clients setup.
"cannot run that command" - you can find the output of the 'sysconfig -a' command in the daily autosupport (if the system is sending). In there, you will find the IP address of the Service Processor (if configured)
SSH to the SP IP. username: 'naroot' - password is the root password. Once in, run the command 'system console'
If that doesn't work, you'd have to use a console cable and manage the node from there
As an update if anyone comes across this thread - we had the controller crash eventually in the end. The LUNs completely went offline and accessing the SAN via direct console was responding perfectly, and believed itself to be in perfect health.
Until i ran "vol status" - then I lost it completely.
We had to hard reset the controller, which forced the LUN into failover to the parner filer finally.
Restarting the controller it performed a filesystem boot repair, and mailbox disk repair.