ONTAP Hardware

Vmware 6.5 and FAS2240-4 latency issue

parisvi
5,030 Views

Can I just start of by saying I only have a basic understanding of netapp so please bare with me, unfortunately our netapp admins has been signed off work for a few weeks sick.....

 

 

We have ESXI hosts 2 for managment (mgt) and 6 for development (dev). All ProLiant BL460c Gen8 Intel Xeon CPU E5-2670 0 @ 2.60GHz x 2 262gb ram

 

Both Dev and Mgt connect to the same Netapp  We upgraded our hosts from 5.5 to 6.5 and since then we've been having latency issues with the dev cluster which eventually brings everything to a halt.

 

We have 2 netapp FAS2240-4

Type: HA Pair

Version 8.2.3p6 7-mode

 

 

netapp1

In aggregate i see 1aggregate

hybrid0

RAID Type:
mixed_raid_type, hybrid

 

 

netapp2 has 2 agrregates

aggr0

sata0

RAID Type:
RAID-DP

 

The 2 ESXI mgt hosts connect only use datastores from volumes on netapp2

 

The 6 ESXI dev hosts connect to both netapp1 and netapp2

 

We are using Emulex Corporation 2 x HP FlexFabric 10Gb 2-port 554FLB Adapters for iscsi on each host.

 

Sometimes we are seeing latency for some vm's up to .5 seconds. Every now and then a host will fail and the vm's will start to be moved around which causes more heavy disk usage and one by one the other hosts fail.

 

I see this in the vmkernel logs:

 

 

suing command 0x439d40b94cc0
/var/run/log/vmkernel.log:2018-09-04T16:39:52.688Z cpu27:66195)WARNING: NMP: nmpDeviceAttemptFailover:640: Retry world failover device "naa.60a9800042394542503f49426b6f724a" - issuing command 0x439d5238a4c0
/var/run/log/vmkernel.log:2018-09-04T16:39:52.689Z cpu22:66371)NMP: nmpCompleteRetryForPath:327: Retry world recovered device "naa.60a9800042394542503f49426b6f7242"
/var/run/log/vmkernel.log:2018-09-04T16:39:53.335Z cpu12:66370)NMP: nmp_ThrottleLogForDevice:3647: Cmd 0x8a (0x43954c8049c0, 76476) to dev "naa.60a9800042394542503f49426b6f7246" on path "vmhba64:C0:T7:L5" Failed: H:0x1 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0. Act:FAILOVER
/var/run/log/vmkernel.log:2018-09-04T16:39:53.335Z cpu12:66370)WARNING: NMP: nmp_DeviceRetryCommand:133: Device "naa.60a9800042394542503f49426b6f7246": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.
/var/run/log/vmkernel.log:2018-09-04T16:39:53.693Z cpu22:66195)WARNING: NMP: nmpDeviceAttemptFailover:640: Retry world failover device "naa.60a9800042394542503f49426b6f7246" - issuing command 0x43954c8049c0
/var/run/log/vmkernel.log:2018-09-04T16:39:53.694Z cpu12:66370)NMP: nmpCompleteRetryForPath:327: Retry world recovered device "naa.60a9800042394542503f49426b6f7246"
/var/run/log/vmkernel.log:2018-09-04T16:39:53.819Z cpu22:66371)NMP: nmp_ThrottleLogForDevice:3647: Cmd 0x89 (0x439d40bab540, 65575) to dev "naa.60a9800042394542503f49426b6f7236" on path "vmhba64:C0:T7:L0" Failed: H:0x1 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0. Act:FAILOVER
/var/run/log/vmkernel.log:2018-09-04T16:39:53.819Z cpu22:66371)WARNING: NMP: nmp_DeviceRetryCommand:133: Device "naa.60a9800042394542503f49426b6f7236": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.
/var/run/log/vmkernel.log:2018-09-04T16:39:54.687Z cpu3:112734)WARNING: NMP: nmpDeviceAttemptFailover:640: Retry world failover device "naa.60a9800042394542503f49426b6f7236" - issuing command 0x439d40bab540

 

 

or this:

 

/var/run/log/vmkernel.log:2018-09-04T17:06:56.949Z cpu16:72849)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60a9800042394542503f49426b6f724a" state in doubt; requested fast path state update...
/var/run/log/vmkernel.log:2018-09-04T17:06:56.949Z cpu16:72849)ScsiDeviceIO: 2968: Cmd(0x439d40bf0ac0) 0xfe, CmdSN 0x88e from world 65575 to dev "naa.60a9800042394542503f49426b6f724a" failed H:0x5 D:0x40 P:0x0 Invalid sense data: 0x80 0x41 0x0.
/var/run/log/vmkernel.log:2018-09-04T17:07:03.078Z cpu18:66394)NMP: nmp_ThrottleLogForDevice:3647: Cmd 0x89 (0x439d4528e0c0, 67172) to dev "naa.60a98000424155764b5d493836386659" on path "vmhba1:C0:T2:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0. Act:NONE
/var/run/log/vmkernel.log:2018-09-04T17:07:03.078Z cpu18:66394)ScsiDeviceIO: 2933: Cmd(0x439d4521a8c0) 0xfe, CmdSN 0xb54 from world 67172 to dev "naa.60a98000424155764b5d493836386659" failed H:0x0 D:0x2 P:0x5 Invalid sense data: 0x80 0x41 0x0.
/var/run/log/vmkernel.log:2018-09-04T17:07:20.614Z cpu27:72852)NMP: nmp_ThrottleLogForDevice:3647: Cmd 0x89 (0x439d4cff09c0, 65575) to dev "naa.60a9800042394542503f49426b6f7244" on path "vmhba1:C0:T5:L4" Failed: H:0x5 D:0x40 P:0x0 Invalid sense data: 0x0 0x0 0x0. Act:EVAL
/var/run/log/vmkernel.log:2018-09-04T17:07:20.614Z cpu27:72852)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60a9800042394542503f49426b6f7244" state in doubt; requested fast path state update...
/var/run/log/vmkernel.log:2018-09-04T17:07:20.614Z cpu27:72852)ScsiDeviceIO: 2968: Cmd(0x439d40bc7fc0) 0xfe, CmdSN 0x65e from world 65575 to dev "naa.60a9800042394542503f49426b6f7244" failed H:0x5 D:0x40 P:0x0 Invalid sense data: 0x80 0x41 0x0.
/var/run/log/vmkernel.log:2018-09-04T17:07:20.620Z cpu15:66393)NMP: nmp_ThrottleLogForDevice:3647: Cmd 0x89 (0x43954115bac0, 72866) to dev "naa.60a9800042394542503f49426b6f7246" on path "vmhba1:C0:T5:L5" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0. Act:NONE
/var/run/log/vmkernel.log:2018-09-04T17:07:20.620Z cpu15:66393)ScsiDeviceIO: 2933: Cmd(0x43954cff4900) 0xfe, CmdSN 0xa9e from world 72866 to dev "naa.60a9800042394542503f49426b6f7246" failed H:0x0 D:0x2 P:0x5 Invalid sense data: 0x80 0x41 0x0.
/var/run/log/vmkernel.log:2018-09-04T17:07:30.676Z cpu27:65935)NMP: nmp_ThrottleLogForDevice:3647: Cmd 0x89 (0x439d45294640, 65575) to dev "naa.60a9800042394542503f49426b6f7248" on path "vmhba1:C0:T6:L6" Failed: H:0x5 D:0x40 P:0x0 Invalid sense data: 0x0 0x0 0x0. Act:EVAL
/var/run/log/vmkernel.log:2018-09-04T17:07:30.676Z cpu27:65935)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60a9800042394542503f49426b6f7248" state in doubt; requested fast path state update...
/var/run/log/vmkernel.log:2018-09-04T17:07:30.676Z cpu27:65935)ScsiDeviceIO: 2968: Cmd(0x439d40bb7240) 0xfe, CmdSN 0x9be from world 65575 to dev "naa.60a9800042394542503f49426b6f7248" failed H:0x5 D:0x40 P:0x0 Invalid sense data: 0x80 0x41 0x0.
/var/run/log/vmkernel.log:2018-09-04T17:08:05.677Z cpu8:66393)NMP: nmp_ThrottleLogForDevice:3647: Cmd 0x89 (0x439541001640, 73158) to dev "naa.60a9800042394542503f49426b6f724a" on path "vmhba1:C0:T7:L7" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0. Act:NONE
/var/run/log/vmkernel.log:2018-09-04T17:08:05.677Z cpu8:66393)ScsiDeviceIO: 2933: Cmd(0x43954101a540) 0xfe, CmdSN 0x921 from world 73158 to dev "naa.60a9800042394542503f49426b6f724a" failed H:0x0 D:0x2 P:0x5 Invalid sense data: 0x80 0x41 0x0.
/var/run/log/vmkernel.log:2018-09-04T17:14:08.592Z cpu0:66393)NMP: nmp_ThrottleLogForDevice:3647: Cmd 0x89 (0x43954ce46800, 73833) to dev "naa.60a9800042394542503f49426b6f7248" on path "vmhba1:C0:T5:L6" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0. Act:NONE
/var/run/log/vmkernel.log:2018-09-04T17:14:08.592Z cpu0:66393)ScsiDeviceIO: 2933: Cmd(0x439d40a954c0) 0xfe, CmdSN 0xc1a from world 73833 to dev "naa.60a9800042394542503f49426b6f7248" failed H:0x0 D:0x2 P:0x5 Invalid sense data: 0x80 0x41 0x0.

 

 

It only happens to the dev cluster which is uses the ssd hybrid volumes.

 

We reverted back to 5.5 once before and everything was ok. We then updated all device firmware then upgraded to 6.5 and installed the drivers according to vmware compatibility lists.

4 REPLIES 4

AlexDawson
4,984 Views

Hi there,

 

Can you please verify if you have followed the June 2018 HPE FlexFabric Cookbook - http://vibsdepot.hpe.com/hpq/recipes/HPE-VMware-Recipe.pdf - also what does your FlexFabric connect to? 

parisvi
4,956 Views

yes the drivers are the same as that cookbook.

 

Its connecting to a Cisco C3750X-48T-S

 

 

Finnzi
4,883 Views

Did you also verify the firmware versions? These are just as important as the drivers.

 

 

Bgrds,

Finnur

th63
4,589 Views

Hi,  

Did you have a fix for this problem ?

we are seeing the same error messages and we are runing 6.5

thx

Public