This is NetApp case # 2004504803.
The issue occurs when NFSv4 client sends old state-id to NetApp & NetApp responds that it is an invalid state-id than expected. So NFSv4 client not querying NetApp to get the most recent state-id, instead it keeps sending old state-id & NetApp keeps responding that it is old state-id. So they both go in indefinite loop at this point. If you capture tcpdump on NFSv4 client side, you would see this occurring continuously. At this point, NFSv4 client hangs or becomes really really slow. If you are running scripts out of NFSv4 exports, they would never complete & corresponding processes would pile up on client.
Below was NetApp's response :
"I can't pin it down to any specific ONTAP issue. The client sends us a state-id that is for a previous locking state and doesn't ever seem to try and correct this. Without seeing the trigger, it hard to say how this state is caused (we would need a capture as the problem occurs, showing the transitioning state). It's possible that if the issue was caused by an OPEN_DOWNGRADE call, that an upgrade would provide some relief from ONTAP perspective, but the client should be able to recover from this error. So far, this appears to be a client side issue which the client should be able to recover from."
Oracle's response was below :
"At this point, we may not have a solution that will work properly on this very aging kernel (which was created before NFSv4 was even a draft spec; *everything* about its NFSv4 code is backported piecemeal).
As a rule of thumb, we highly recommend that anyone using NFSv4 in production environments make use of the Unbreakable Enterprise Kernel (UEK).
If at all possible, please try UEKr2 on at least one of these boxes and see whether the issue can be reproduced. The current version is 2.6.39-400.109.6.el5uek. The NFSv4 code in that kernel is considerably newer
than in RHCK, with much more active NFSv4 production workloads. This is likely to be the easiest (and most stable) resolution available for this issue.
We will continue to look into this in RHCK, but if a UEK upgrade is possible, that will likely solve this issue as well as many more well-known bugs."