2011-10-18 12:35 AM
I have a netapp FAS 3020 box with dual controller , and they are in cluster mode.
The issue i am facing with it is that when i give halt cmd from one controller it gets stuck in cifs termination, cifs termination dosenot happen and because of that the other controller does not takeover its partner.
but when i give manually cmd of cf takeover the takeover happens.
Can any netapp expert can help me on this.
2011-10-19 12:31 AM
With hardware issues or network disconnected I have seen cifs not terminate... once with an HBA that was flakey and another with no network cifs would hang in a process on termination...any errors on the console or messages with cifs locked in step with a number? There was also a BURT on an older version of ONTAP where you had to halt -d dumpname to get cifs to terminate and dump core...don't have the burt # now but am pretty sure it was a release a year or more ago.
2011-11-06 11:16 PM
thanks for taking intrest on this discussion
i have already tried to terminate cifs by cifs terminate cmd before halt but did not got any success.
I have tried event " halt" cmd so that automatic takeover can happen but filer was not able to terminate the cifs session. so takeover did not happend.
i have tried even " halt -t 0" cmd but still did not got any success.
even , anybody can tell me can we use " halt -s" to halt the filer in HA configration. will takeover will happen in this case.?
the version of ontap is 7.3.2P3
2011-11-07 07:06 AM
First, I would not use "halt -f" as that does not incur a partner takeover. If you need to do a failover, use "cf takeover/giveback" - especially since it seems to work for you.
If you cannot manually initiate a "cifs terminate" then I would focus my troubleshooting efforts there. Check the following:
- Do the options for "cf" and "cifs" match on both heads?
- If you do a "cifs terminate", do the number of "cifs sessions" decrease?
- If you do see the number of "cifs sessions" decreasing, are there one or two in particular that don't seem to shut down?
- Is your networking okay? Can you ping your DCs by both name and IP?
- Does the output of "cifs testdc" look right for your environment?
- Have you tried "cifs testdc" successfully?
- Are any messages being generated that might be helpful?
- Is this behavior the same on both heads?
As Scott Gelb mentioned, this is often due to being unable to communicate with the network properly. Are there any other indicators of this type of behavior?
2011-11-07 09:01 PM
- when i do cifs terminate the number of cifs sessions does not decrease.
-network is ok i can ping DCs and by both IP and name.
-I have not check "cifs testdc" i will check it , but i don't think it is going to help .
-No any type of error message is not generated.
I want to give little bit more information which i sawed when i was manually performing the "cf takeover", when takeover happend , and during "cf giveback" when i used the "cifs terminate" cmd it terminated the cifs sessions. and this happend on both the head during takeover the cifs terminate cmd works , but in normal mode cifs terminate does not works.
2011-11-07 09:14 PM
Here are the simlar burts where "halt -d" is used to dump core and force cifs to terminate... I saw another case where a bad FC HBA caused issues and cifs gave an odd error on the current process it was in (step 120 of 200 or something) on the termination... so could be a hardware issue too.