"No buffer space available" during vFiler migrate

fletch2007 · ‎2010-11-11

Hi, I've opened a case on this, but wanted to get the community's feedback

Summary:

Last night just after initiating 2 vfiler migrations, we experienced an outage on our new 3170 running 7.3.3
It dropped off the network around 10:50pm – I could not ssh to it – I logged in via the RLM and found the cf status said up, but all the network and NFS services were down.
I ended up initiating a takeover from its partner, then a giveback after the problem head came up clean

It was logging messages like these on the RLM console:

ping: wrote 17.6.6.1 64 chars, error=No buffer space available
na01> Wed Nov 10 23:00:07 PST [irt-na01: Java_Thread:info]: Lookup of time.school.edu failed with DNS server 17.6.7.77: No buffer space available.
syslogd: Could not forward message to host 17.2.65.2: No buffer space available
Nov 10 23:00:07 [na01: Java_Thread:info]: Lookup of time.school.edu failed with DNS server 17.6.7.77: No buffer space available.
syslogd: Could not forward message to host 17.2.65.2: No buffer space available

CPU was high before and after the outage

NFS ops were low at the outage time

Questions:

1) is this network related buffer space?

2) How can I track the buffer space usage? (is there an SNMP exposed metric?)

3) if its network buffers - is it related to a 10GigE bug? (I thought Netapp had worked those out for the most part)

many thanks for any info / experience,

http://vmadmin.info

Fletcher

fletch2007 · ‎2010-11-12

Turns out it was related to a bug - details of the workaround:

http://www.vmadmin.info/2010/11/vfiler-migrate-netapp-lockup.html

thanks