ONTAP Discussions

Failover Monitor: unable to transit - giveback process is hung (vfiler_low_level) in SK

CSTOCKDA
3,726 Views

Hello,

 

We have a system still on 8.1.4 which we have replaced the motherboard for (Process all followed) however when we try to do the giveback we get the below message:

 

Panic string: Failover Monitor: unable to transit - giveback process is hung (vfiler_low_level) in SK process cf_main on release 8.1.4P1

We have attempted the following:

 

vfiler run * cifs terminate

 

partner vfiler run * cifs terminate

 

This still resulted in filer panic when attempting to perform a giveback

cf giveback and cf giveback -f both resulted in panics.

 

It seems impossible to clear. At this moment I cannot think of any option other than too reboot the node or a last ditch effort of forcegiveback. The data is migrated off so no risk of data loss.

 

Has anyone come into this error before and does anyone have any suggestions to clear this other than drastic measures such as hard reboot and forcegiveback ?

 

Thanks

 

7 REPLIES 7

Reverett
3,667 Views

Which node will panic during the cf giveback -f?

 

Is it the node issuing the givie back that panics or the node we are trying to boot after the motherboard replacement?

 

Can i have you provide a cf monitor and cf status outputs as well please.

JPoorboy
3,658 Views

It is the up node which panics.

Here is what we tried this evening on this.

Terminated CIFS on Node1 as well as in the partner context
Stopped all vfilers
cf giveback showed the same message ("vfiler_low_level", 110 of 158 modules)
This time though the filer completed the giveback successfully
Restarted both CIFS and VFILERS
Attempted new takeover for testing
Disabled just the CIFS and VFIler in partner context and attempting giveback
Giveback successful
Did a takeover from Node2 this time
Takeover successful
Attempting giveback without shutting down CIFS or vfilers
Node2 is showing the same message ("vfiler_low_level", 110 of 158 modules)
Giveback took over 30 minutes but did complete
Takeover from Node1 again
Terminated CIFS and Vfilers in partner context giveback has hung again

Reverett
3,651 Views

Is anything done differently when the giveback is successful and when it fails, or does it just seem to work on occasion? 

 

From the above chat it appears the giveback works or fails independently of the vfilers running or not

 

 

 

JPoorboy
3,649 Views

So, we got the giveback to work on Node1 twice and both times vfilers and CIFS had been terminated.
Yesterday without stopping both the giveback would result in the up node (Node1) having a panic.

We tried a takeover from Node2 as well tonight for the first time and the giveback took over 30 minutes to complete.

 

The one thing that has me stumped is that the only place I see any mention of this process ("vfiler_low_level") is in an ONTAP 9 doc for giveback veto.
I can't find anything for ONTAP 8...

Reverett
3,643 Views

We currently do not appear to have any public KBs specifically for ontap 8 7-mode relating to this panic.

 

Based on other panics we have seen with the same message this panic, I can see this panic has different known causes that could be identified with the core file  analysis of the core file that would be dumped with this panic.

ttran
3,529 Views

Hello @CSTOCKDA,

 

There are some very old bugs with a race condition with tearing down the vfiler too soon on the up node and rebuilding it on the node that was given back with CIFS workload. As @Reverett mentioned, this will require the CORE dump to be analyzed to look at the stack trace during the PANIC to concretely identify the cause. Unfortunately, ONTAP 8.1.x is End of Support therefore if possible to upgrade to ONTAP 8.2.5P5. Without looking at the stack trace we also won't be able to link an exact bug or KB.

 

Data ONTAP 8.2.5P5 

 

 

Regards,

 

Team NetApp

Team NetApp

Ethy
3,430 Views

Try this as workaround , we tested and it works

" Disable HTTP/HTTPS on all the vfilers 20 min before initiating the takeover/giveback phases should prevent the same panics.

Once the protocols have been disabled, monitor the sessions opened on the systems with the command "netstat -na" to confirm no sessions are opened on the ports 80 and 443”

Public