Hi Mike -
Good news/Bad news type of answer here. It isn't so much that the clients have a ton of open files in volumes - rather it's that internal Data OnTap applications do. Consider the text from "bug" 746936 :
Each Data ONTAP application has a limit of 2,048 opened files. When an
application hits this limit, it will no longer be able to open new files, which
can prevent it from writing to disk or communicating on the network.
This can often happen because too many external applications are connecting to
a Data ONTAP application.
If you use scripts in your environment, you can reduce the load on opened files
by ensuring that the scripts reuse the same network connections, files, or
The "bug" isn't something that will get fixed in that it just describes what happens when an environment tries to exceed the capability of the NetApp system, in this case driving too much load to one of its applications. The "applications" in question are internal Data OnTap processes, like the SSH daemon and/or the web server service. Remember that under the covers DoT is a BSD OS. So each individual process has a per process file and memory limit just like a process in any regular BSD installation would have. Somewhere one of the processes has run up against the limit and is messing up your system. So that's the good part - there is an explanation for the underlying cause.
Now - the hard part - finding out what is triggering this condition. Obviously a cluster failover/giveback would clear the situation for a while as it would force all processes to stop and restart. That's the quick "fix". But of course the situation is likely to come back. As indicated in the "bug" I'd start first with the environment. Is there some process or monitoring tool which is regularly opening connections to your nodes? Are they releasing those connections properly or simply opening new ones over time and holding onto the old ones? I've certainly run into that type of situation before where a monitoring tool kept opening SSH sessions to run command lines but didn't ever formally release the old ones due to an implementation error.
Failing that, at the diagnostic privilege level you can get close to command line level access to the underlying BSD - as I recall you can run a "ps" command as well as kill individual processes if needed. Unfortunately I've been away from 7-mode for quite a while and I don't trust that I have current enough info to guide you - hopefully someone else might be able to. But like the cluster failover routine, killing (and hopefully restarting) an underlying process is only a temporary fix. You'll need to determine why that specific process is getting hit up to open files.
Hope this helps you.
Lead Storage Engineer
Huron Legal | Huron Consulting Group
NCIE - SAN Clustered, Data Protection