Subscribe
Accepted Solution

VMWARE ESX 3.5 & NETAPP NFS

Our current setup consists of 12 ESX servers connecting to 27 NFS shares on the NetApp. We are currently hosting about 400 virtual machines and are starting to notice NFS disconnects in the vmkwarning logs.

Has anybody got a similar setup or experienced these NFS disconnects? If so please can you provide some tips.

Many Thanks in advance!

Re: VMWARE ESX 3.5 & NETAPP NFS

Hi,

Did you read the Netapp white paper TR-3428 ?

Re: VMWARE ESX 3.5 & NETAPP NFS

Yes.

We also have 7 Nics in each esx server trunked then have the various port groups assigned specific vlans. See below:-

NET.jpg

Regards

Gordon

Re: VMWARE ESX 3.5 & NETAPP NFS

Would you mind posting the errors in question?

And...I'm curious as to why you have 27 NFS datastores?

Re: VMWARE ESX 3.5 & NETAPP NFS

There are two errors which appear.

First Error:

May 31 06:20:03 ukvirt07 vmkernel: 1:20:06:07.459 cpu8:1049)WARNING: NFS: 1736:                                                                                                                                Failed to get attributes (I/O error)
May 31 06:20:03 ukvirt07 vmkernel: 1:20:06:07.459 cpu11:1047)WARNING: NFS: 1736:                                                                                                                                Failed to get attributes (I/O error)

Second Error:

May 31 14:24:56 esxdmz vmkernel: 1:03:09:05.312 cpu4:1032)WARNING: NFS: 257: Mount: (ESX_CIFS_01) Server (10.119.204.10) 10.119.204.10 Volume: (/vol/ESX_CIFS_01/Q_ESX_CIFS_01) not responding
May 31 14:25:43 esxdmz vmkernel: 1:03:09:52.776 cpu7:1077)WARNING: NFS: 281: Mount: (ESX_CIFS_01) Server (10.119.204.10) 10.119.204.10 Volume: (/vol/ESX_CIFS_01/Q_ESX_CIFS_01) OK
May 31 21:35:06 esxdmz vmkernel: 1:10:19:14.563 cpu6:1040)WARNING: NFS: 1736: Failed to get attributes (I/O error)
May 31 21:35:36 esxdmz vmkernel: 1:10:19:44.606 cpu7:1039)WARNING: NFS: 1736: Failed to get attributes (I/O error)
May 31 21:35:56 esxdmz vmkernel: 1:10:20:04.629 cpu5:1032)WARNING: NFS: 257: Mount: (VM_NFS_DMZ01) Server (10.119.204.10) 10.119.204.10 Volume: (/vol/VMS_NFS_DMZ01/Q_VMS_NFS_DMZ01) not responding
May 31 21:36:06 esxdmz vmkernel: 1:10:20:14.643 cpu5:1041)WARNING: NFS: 1736: Failed to get attributes (I/O error)
May 31 21:36:06 esxdmz vmkernel: 1:10:20:14.643 cpu6:1040)WARNING: NFS: 1736: Failed to get attributes (No connection)
May 31 21:36:06 esxdmz vmkernel: 1:10:20:14.659 cpu6:1040)WARNING: NFS: 1736: Failed to get attributes (No connection)
May 31 21:36:06 esxdmz vmkernel: 1:10:20:14.659 cpu6:1040)WARNING: NFS: 1736: Failed to get attributes (No connection)
May 31 21:36:06 esxdmz vmkernel: 1:10:20:14.659 cpu5:1041)WARNING: NFS: 1736: Failed to get attributes (No connection)
May 31 21:36:06 esxdmz vmkernel: 1:10:20:14.704 cpu5:1041)WARNING: NFS: 1736: Failed to get attributes (No connection)
May 31 21:36:06 esxdmz vmkernel: 1:10:20:14.721 cpu5:1041)WARNING: NFS: 1736: Failed to get attributes (No connection)
May 31 21:36:06 esxdmz vmkernel: 1:10:20:14.723 cpu6:1040)WARNING: NFS: 1736: Failed to get attributes (No connection)
May 31 21:36:06 esxdmz vmkernel: 1:10:20:14.723 cpu5:1041)WARNING: NFS: 1736: Failed to get attributes (No connection)
May 31 21:36:06 esxdmz vmkernel: 1:10:20:14.745 cpu6:1040)WARNING: NFS: 1736: Failed to get attributes (No connection)
May 31 21:36:06 esxdmz vmkernel: 1:10:20:14.745 cpu5:1039)WARNING: NFS: 1736: Failed to get attributes (No connection)
May 31 21:36:06 esxdmz vmkernel: 1:10:20:14.748 cpu6:1040)WARNING: NFS: 1736: Failed to get attributes (No connection)
May 31 21:36:06 esxdmz vmkernel: 1:10:20:14.773 cpu5:1041)WARNING: NFS: 1736: Failed to get attributes (No connection)
May 31 21:36:06 esxdmz vmkernel: 1:10:20:14.806 cpu5:1041)WARNING: NFS: 1736: Failed to get attributes (No connection)
May 31 21:37:10 esxdmz vmkernel: 1:10:21:18.340 cpu6:1041)WARNING: NFS: 1736: Failed to get attributes (No connection)
May 31 21:37:10 esxdmz vmkernel: 1:10:21:18.341 cpu6:1041)WARNING: NFS: 1736: Failed to get attributes (No connection)
May 31 21:37:10 esxdmz vmkernel: 1:10:21:18.357 cpu6:1040)WARNING: NFS: 1736: Failed to get attributes (No connection)
May 31 21:37:10 esxdmz vmkernel: 1:10:21:18.357 cpu6:1040)WARNING: NFS: 1736: Failed to get attributes (No connection)
May 31 21:37:41 esxdmz vmkernel: 1:10:21:49.224 cpu6:1101)WARNING: VSCSI: 5236: READ_CAPACITY : Could not get capacity for virtual device
May 31 21:37:41 esxdmz vmkernel: 1:10:21:49.231 cpu6:1101)WARNING: VSCSI: 5236: READ_CAPACITY : Could not get capacity for virtual device
May 31 21:37:41 esxdmz vmkernel: 1:10:21:49.231 cpu6:1101)WARNING: VSCSI: 5236: READ_CAPACITY : Could not get capacity for virtual device
May 31 21:37:42 esxdmz vmkernel: 1:10:21:50.864 cpu7:1101)WARNING: VSCSIFs: 426: scatter-gather says length 0, op says 4096
May 31 21:37:44 esxdmz vmkernel: 1:10:21:52.581 cpu5:1101)WARNING: VSCSIFs: 426: scatter-gather says length 0, op says 4096
May 31 21:37:45 esxdmz vmkernel: 1:10:21:53.852 cpu5:1101)WARNING: VSCSIFs: 426: scatter-gather says length 0, op says 4096
May 31 21:37:45 esxdmz vmkernel: 1:10:21:53.894 cpu5:1041)WARNING: NFS: 1736: Failed to get attributes (No connection)
May 31 21:37:45 esxdmz vmkernel: 1:10:21:53.894 cpu5:1039)WARNING: NFS: 1736: Failed to get attributes (No connection)
May 31 21:39:15 esxdmz vmkernel: 1:10:23:23.384 cpu3:1122)WARNING: NFS: 281: Mount: (VM_NFS_DMZ01) Server (10.119.204.10) 10.119.204.10 Volume: (/vol/VMS_NFS_DMZ01/Q_VMS_NFS_DMZ01) OK
Jun  1 17:01:27 esxdmz vmkernel: 2:05:45:32.258 cpu3:1039)WARNING: NFS: 1736: Failed to get attributes (I/O error)
Jun  1 17:01:57 esxdmz vmkernel: 2:05:46:02.282 cpu0:1041)WARNING: NFS: 1736: Failed to get attributes (I/O error)
Jun  1 17:02:27 esxdmz vmkernel: 2:05:46:32.300 cpu3:1039)WARNING: NFS: 1736: Failed to get attributes (I/O error)
Jun  1 23:53:26 esxdmz vmkernel: 2:12:37:29.706 cpu3:1041)WARNING: NFS: 1736: Failed to get attributes (I/O error)
Jun  1 23:53:56 esxdmz vmkernel: 2:12:37:59.725 cpu2:1039)WARNING: NFS: 1736: Failed to get attributes (I/O error)
Jun  1 23:54:13 esxdmz vmkernel: 2:12:38:16.735 cpu1:1032)WARNING: NFS: 257: Mount: (VM_NFS_DMZ01) Server (10.119.204.10) 10.119.204.10 Volume: (/vol/VMS_NFS_DMZ01/Q_VMS_NFS_DMZ01) not responding
Jun  1 23:54:26 esxdmz vmkernel: 2:12:38:29.747 cpu0:1040)WARNING: NFS: 1736: Failed to get attributes (I/O error)
Jun  1 23:54:26 esxdmz vmkernel: 2:12:38:29.761 cpu0:1039)WARNING: NFS: 1736: Failed to get attributes (No connection)
Jun  1 23:54:26 esxdmz vmkernel: 2:12:38:29.778 cpu3:1040)WARNING: NFS: 1736: Failed to get attributes (No connection)
Jun  1 23:54:26 esxdmz vmkernel: 2:12:38:29.792 cpu3:1041)WARNING: NFS: 1736: Failed to get attributes (No connection)
Jun  1 23:54:26 esxdmz vmkernel: 2:12:38:29.793 cpu3:1039)WARNING: NFS: 1736: Failed to get attributes (No connection)
Jun  1 23:54:26 esxdmz vmkernel: 2:12:38:29.823 cpu1:1041)WARNING: NFS: 1736: Failed to get attributes (No connection)
Jun  1 23:54:26 esxdmz vmkernel: 2:12:38:29.839 cpu1:1040)WARNING: NFS: 1736: Failed to get attributes (No connection)
Jun  1 23:54:26 esxdmz vmkernel: 2:12:38:29.842 cpu3:1039)WARNING: NFS: 1736: Failed to get attributes (No connection)
Jun  1 23:54:26 esxdmz vmkernel: 2:12:38:29.843 cpu1:1040)WARNING: NFS: 1736: Failed to get attributes (No connection)
Jun  1 23:54:26 esxdmz vmkernel: 2:12:38:29.871 cpu3:1039)WARNING: NFS: 1736: Failed to get attributes (No connection)
Jun  1 23:54:26 esxdmz vmkernel: 2:12:38:29.887 cpu1:1040)WARNING: NFS: 1736: Failed to get attributes (No connection)
Jun  1 23:54:26 esxdmz vmkernel: 2:12:38:29.919 cpu1:1040)WARNING: NFS: 1736: Failed to get attributes (No connection)
Jun  1 23:54:26 esxdmz vmkernel: 2:12:38:29.919 cpu2:1041)WARNING: NFS: 1736: Failed to get attributes (No connection)
Jun  1 23:55:31 esxdmz vmkernel: 2:12:39:35.385 cpu3:1027)WARNING: NFS: 281: Mount: (VM_NFS_DMZ01) Server (10.119.204.10) 10.119.204.10 Volume: (/vol/VMS_NFS_DMZ01/Q_VMS_NFS_DMZ01) OK
Jun  2 00:06:29 esxdmz vmkernel: 2:12:50:33.346 cpu2:1041)WARNING: NFS: 1736: Failed to get attributes (I/O error)
Jun  2 00:06:59 esxdmz vmkernel: 2:12:51:03.372 cpu1:1039)WARNING: NFS: 1736: Failed to get attributes (I/O error)
Jun  2 00:07:24 esxdmz vmkernel: 2:12:51:28.404 cpu1:1032)WARNING: NFS: 257: Mount: (VM_NFS_DMZ01) Server (10.119.204.10) 10.119.204.10 Volume: (/vol/VMS_NFS_DMZ01/Q_VMS_NFS_DMZ01) not responding
Jun  2 00:07:29 esxdmz vmkernel: 2:12:51:33.411 cpu0:1040)WARNING: NFS: 1736: Failed to get attributes (I/O error)
Jun  2 00:07:29 esxdmz vmkernel: 2:12:51:33.445 cpu2:1041)WARNING: NFS: 1736: Failed to get attributes (No connection)
Jun  2 00:07:29 esxdmz vmkernel: 2:12:51:33.445 cpu1:1040)WARNING: NFS: 1736: Failed to get attributes (No connection)
Jun  2 00:07:29 esxdmz vmkernel: 2:12:51:33.445 cpu2:1039)WARNING: NFS: 1736: Failed to get attributes (No connection)
Jun  2 00:07:30 esxdmz vmkernel: 2:12:51:33.475 cpu2:1041)WARNING: NFS: 1736: Failed to get attributes (No connection)
Jun  2 00:07:30 esxdmz vmkernel: 2:12:51:33.520 cpu1:1040)WARNING: NFS: 1736: Failed to get attributes (No connection)
Jun  2 00:07:30 esxdmz vmkernel: 2:12:51:33.530 cpu1:1039)WARNING: NFS: 1736: Failed to get attributes (No connection)
Jun  2 00:07:30 esxdmz vmkernel: 2:12:51:33.544 cpu1:1039)WARNING: NFS: 1736: Failed to get attributes (No connection)
Jun  2 00:07:55 esxdmz vmkernel: 2:12:51:59.386 cpu5:1123)WARNING: NFS: 281: Mount: (VM_NFS_DMZ01) Server (10.119.204.10) 10.119.204.10 Volume: (/vol/VMS_NFS_DMZ01/Q_VMS_NFS_DMZ01) OK

We have 27 volumes to host the different types of systems & services we offer our end users, these volumes also have different snapshot schedules.

Re: VMWARE ESX 3.5 & NETAPP NFS

Hi,

What is the "load balancing" policy of the vswitch ?

The default value is "Route based on the originating virtual port ID". As in TR-3428, page 50, it should be "Route based on ip hash" when used in multiswitch trunking.

I also should recommend to split you "normal" server trafic of the NFS network trafic by defining multiple vswitches.

Re: VMWARE ESX 3.5 & NETAPP NFS

Gordon,

How much service console memory do you have allocated to your host servers? If you haven't already, you should definitely increase your service console memory to the 800mb maximum.

Andy

Re: VMWARE ESX 3.5 & NETAPP NFS

They all have 800MB.

Re: VMWARE ESX 3.5 & NETAPP NFS

It is set to "Route based on the originating virtual port ID". We have since had a VMware consultant come in and recommended seperating the NFS traffic on to a seperate vSwitch.

We will make this change and see how it goes.

Re: VMWARE ESX 3.5 & NETAPP NFS

Gordon,

When having that many mounts in the past, sometimes we've had to increase "tcpheapsizemax" and "tcpheapsize" to 60MB. We'd usually see other tcpheap error messages in the logs though. Are you seeing any other errors?

Andy