2010-06-30 07:14 AM
Hi - we've implemented a solution based around two ESXi servers (now live), a FAS2020 (Ontap 7.3.3) - with NFS Datastores presented to the ESXi servers.
Everything is generally working fine, but there are a couple of circumstances where we see problems on the ESXi servers. Deletion of snapshots fails, and upload / download of large files gives an 'I/O' error.
After a little digging, when the I/O problem occurs, we see the TCP badcalls counter increment by 1. Also the TCP xdrcall counter increments by 1.
This is the only clue I've found so far on the Netapp.
Over on the ESXi servers, we see: Jun 30 13:20:55 vmkernel: 6:18:26:03.136 cpu1:4196)BC: 3582: Failed to flush 128 buffers of size 8192 each for object '' b00f 36 0 40 f87c2c 20 59c0e600 5c98cf 1126c44 40 100f87c2c c231fbca00000001 4100 c231fa0800000000: I/O error
nfs TCP is enabled. NFS UDP is disabled. NFS v3 is enabled, and that appears from the stats to be what the ESXi servers are using.
I do have one question, which is whether this is related to Flow Control? Jumbo Frames is disabled all round (hosts, filer, switch).
Any help or directives for further investigation would be hugely appreciated - the snapshot problem in particular is causing us a big headache.
2010-06-30 01:44 PM
I am not saying this will immediately solve your problem, but can you enable Jumbo Frames by any chance?
This is the best practice to use them as they significantly reduce CPU utilisation both on the host & the filer side.
2010-06-30 03:33 PM
I ve seen issues here where we have loooooong CP times that will impact snapshots. The reason behind our problems is that we have misaligned Vmware hosts/guests.
In your case I d start looking at that as well as the jumbo frames issue; check for misalignment issues?
2010-07-01 01:11 AM
Both of these I can look into - but i'll have to wait until the customer gives us a window to try it!
I assume that what we're talking about here is the mbralign tool? That might be tricky - it's ESXi, and there's not a Linux station in sight!
2010-07-29 06:06 AM
Well, I finally traced my problem - it was the network card! Switched over to a different card - problem gone immediately! The card I was using before that had the problem? HP's NC375T - every other form of traffic was fine, just NFS traffic that was troublesome.