2012-05-05 12:09 PM
FAS 3210 with Dual Controllers, configured as Active / Active. 1 x Quad Port NIC on each controller, Two 'if-groups' configured on each controller comprising of TWO interfaces as LACP. All working and good, so it was time to test the transparent failover ;
OS = 8.0.2 7 Mode
a) A server had a LUN (created and configured inside a volume on Controller - 1) mounted as a drive in windows.
b) I did a console session to Controller - 1 and issued "halt"
c) All good, Controller-0 took over and as I was running a copy job via the command-line, I saw no disruption and copy was going on
d) I was happy and thought ok, this looks good, so on Controller - 1 Console, I issued "bye" and the system started rebooting (i Think)
e) Controller - 1 then came to a stand-still "waiting for giveback - press CTRL-C to abort"
f) I then issued "cf giveback" on Controller - 0
g) Process started with lots of output on both consoles
h) I lost the LUN I had mounted on the server and the copy job started to fail
i) I then pinged the ifgroup IP Address on Controller - 1 and "request timed out"
j) I fired up MMC (SYSTEM Manager) > Controller - 1 > Network interfaces > ifgroup-1 > right click > edit > VIF and this tab was showing all 4 NICs and not the actual MEMBER NICS I had assigned previously
So, it seemed as if it had lost the NIC configuration, I then manually assigned the appropriate interfaces to the ifgroups and it came back up, all LUNS in place etc etc...
Any thoughts, and if someone can provide a brief step by step set of commands to test failover - that will be very helpful !
--One More thing--
If I have ONE Console session to Controller - 1 and ONE SSH session to Controller - 0 and I issue "HALT" on Controller - 0 via SSH, I will loose the SSH, is there a way I can issue "bye" command to Controller - 0 via Controller - 1 or initiate a giveback to Controller - 0 from Controller 1.
Thanks for reading !!!
Solved! SEE THE SOLUTION
2012-05-05 01:06 PM
It sounds like your rc and hosts files were modified. Check them to make sure they match the config you had running.
Use the sp or service processor console connection. Then takeover and giveback on the live node. I don't know a way to initiate from the down node. Once waiting for giveback it reprocessed the rc and hosts files for networking do your problem on the Ifgrp must be there.
2012-05-05 02:29 PM
I have forgotten the SP Port IP Address :-), I have console access to both controllers, shall I just type the "bye" command on Controller 0 to let it reboot ?
Also, why did those files got modified ?
How can I verify and protect them in future ?
I remembered the original config, I had changed it to what it should have been when it came back again, so I guess those rc / hosts file would reflect the change after fail-over I did ?
Can I change the MTU of all ifgroups without actually breaking the VIF or do I have to re-create them ?
Thanks again for your prompt responses - this is very helpful !!
2012-05-05 02:37 PM
Bye works. It simulate a power cycle. Useful after an env or firmware upgrade. You could also type "boot_ontap" or "boot_primary"
Rc and hosts are protected by the snapshot schedule. You could also do a snapmirror, tape backup. Or any way to copy them. Someone either edited or used system manager to change it.
Mtu is in the ifconfig setting not he Ifgrp command. So you can change it without recreating the Ifgrp. Might cause a reset of the interface though.
Sent from my iPhone 4S