Subscribe

ESX 4.0 hosts causing SCSI Hung Commands on Adapter

I have two new ESX 4.0u1 hosts that I am storage migrating VMs from an old HP EVA world to the new NetApp world (3170s running 7.3.2). During migration to the new NetApp Datastores and migration to the new hosts (thus transferring i/o from the 3.5 world to the 4.0 world ) we will get a Hung Command on one of the Target Adapters that requires a config down and then a config up to clear. This Hung Command prevents all i/o to that target and makes on of the paths dead on both ESX 4.0 hosts. Other hosts that are accessing that path also lose connectivity to their LUNs via that adapter port. What I am trying to figure out is why is this happening, and why are no ASUPs being generated or why is there no way to clear the hung command, other than to do an adapter reset?

I have ALUA enabled on the igroup and am using the VMW_RR_PSP.

I am zoned from my hosts to all front end ports on the NetApp, two fabrics. So I have 8 paths to the disk (4 Active I/O and 4 Active).

I have a ticket open with NetApp but am wondering if this is really an ESX issue.

errors from esx logs:

Feb  8 10:28:55 caustdsh0610 vmkernel: 3:00:26:01.671 cpu2:4844)ScsiCore: 95: Starting taskmgmt handler world 4844/2
Feb  8 10:29:03 caustdsh0610 vmkernel: 3:00:26:09.677 cpu10:4845)ScsiCore: 95: Starting taskmgmt handler world 4845/3
Feb  8 10:29:05 caustdsh0610 vmkernel: 3:00:26:11.679 cpu7:4846)ScsiCore: 95: Starting taskmgmt handler world 4846/4
Feb  8 10:29:07 caustdsh0610 vmkernel: 3:00:26:13.681 cpu0:4847)ScsiCore: 95: Starting taskmgmt handler world 4847/5
Feb  8 10:29:09 caustdsh0610 vmkernel: 3:00:26:15.683 cpu6:4848)ScsiCore: 95: Starting taskmgmt handler world 4848/6
Feb  8 10:29:11 caustdsh0610 vmkernel: 3:00:26:17.661 cpu12:4459)BFA[0000:49:00.0][error] BFA_AEN_RPORT_DISCONNECT: Remote port (WWN = 50:0a:09:82:89:6b:32:0
b) connectivity lost for logical port (WWN = 10:00:00:05:1e:9c:c9:fd).
Feb  8 10:29:11 caustdsh0610 vmkernel: 3:00:26:17.661 cpu12:4459)BFA[0000:49:00.0][error] BFA_AEN_ITNIM_DISCONNECT: Target (WWN = 50:0a:09:82:89:6b:32:0b) co
nnectivity lost for initiator (WWN = 10:00:00:05:1e:9c:c9:fd).
Feb  8 10:29:11 caustdsh0610 vmkernel: 3:00:26:17.661 cpu12:4459)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a0c3e80) to NMP device "naa.60a9800057
2d4c74564a552f4437574e" failed on physical path "vmhba2:C0:T3:L10" H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Feb  8 10:29:11 caustdsh0610 vmkernel: 3:00:26:17.661 cpu12:4459)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60a98000572d4c74564a552f4437
574e" state in doubt; requested fast path state update...
Feb  8 10:29:11 caustdsh0610 vmkernel: 3:00:26:17.661 cpu12:4459)ScsiDeviceIO: 747: Command 0x2a to device "naa.60a98000572d4c74564a552f4437574e" failed H:0x
7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Feb  8 10:29:11 caustdsh0610 vmkernel: 3:00:26:17.661 cpu12:4459)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a128980) to NMP device "naa.60a9800057
2d4c74564a552f44386444" failed on physical path "vmhba2:C0:T3:L11" H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Feb  8 10:29:11 caustdsh0610 vmkernel: 3:00:26:17.661 cpu12:4459)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60a98000572d4c74564a552f4438
6444" state in doubt; requested fast path state update...
Feb  8 10:29:11 caustdsh0610 vmkernel: 3:00:26:17.661 cpu12:4459)ScsiDeviceIO: 747: Command 0x2a to device "naa.60a98000572d4c74564a552f44386444" failed H:0x
7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Feb  8 10:29:11 caustdsh0610 vmkernel: 3:00:26:17.661 cpu12:4459)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a0dd2c0) to NMP device "naa.60a9800057
2d4c74564a552f4437574e" failed on physical path "vmhba2:C0:T3:L10" H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Feb  8 10:29:11 caustdsh0610 vmkernel: 3:00:26:17.661 cpu12:4459)ScsiDeviceIO: 747: Command 0x2a to device "naa.60a98000572d4c74564a552f4437574e" failed H:0x
7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Feb  8 10:29:11 caustdsh0610 vmkernel: 3:00:26:17.661 cpu12:4459)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a0a10c0) to NMP device "naa.60a9800057
2d4c74564a552f4437574e" failed on physical path "vmhba2:C0:T3:L10" H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Feb  8 10:29:11 caustdsh0610 vmkernel: 3:00:26:17.661 cpu12:4459)ScsiDeviceIO: 747: Command 0x2a to device "naa.60a98000572d4c74564a552f4437574e" failed H:0x
7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Feb  8 10:29:11 caustdsh0610 vmkernel: 3:00:26:17.661 cpu12:4459)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a059cc0) to NMP device "naa.60a9800057
2d4c74564a552f4437574e" failed on physical path "vmhba2:C0:T3:L10" H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Feb  8 10:29:11 caustdsh0610 vmkernel: 3:00:26:17.661 cpu12:4459)ScsiDeviceIO: 747: Command 0x2a to device "naa.60a98000572d4c74564a552f4437574e" failed H:0x
7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Feb  8 10:29:11 caustdsh0610 vmkernel: 3:00:26:17.661 cpu12:4459)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a02c100) to NMP device "naa.60a9800057
2d4c74564a552f44386444" failed on physical path "vmhba2:C0:T3:L11" H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Feb  8 10:29:11 caustdsh0610 vmkernel: 3:00:26:17.661 cpu12:4459)ScsiDeviceIO: 747: Command 0x2a to device "naa.60a98000572d4c74564a552f44386444" failed H:0x
7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Feb  8 10:29:11 caustdsh0610 vmkernel: 3:00:26:17.661 cpu12:4459)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a200f40) to NMP device "naa.60a9800057
2d4c74564a552f4437574e" failed on physical path "vmhba2:C0:T3:L10" H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Feb  8 10:29:11 caustdsh0610 vmkernel: 3:00:26:17.661 cpu12:4459)ScsiDeviceIO: 747: Command 0x2a to device "naa.60a98000572d4c74564a552f4437574e" failed H:0x
7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

Re: ESX 4.0 hosts causing SCSI Hung Commands on Adapter

I am seeing this with my Emulex HBAs too, in fact it seems to cause my entire ESX environment to die while it is timing out...

<LR d="09Feb2010 16:27:15" n="napp2" pn="napp1" t="1265754435" id="1263246700/6116" p="7" s="C=1U" o="isp2400fct_main_2" vf="">

<fcp_io_status_1

      event="Adapter:0a, found hung cmd:0x00000000189d1980 (state=259, flags=0x1000085/0x0, ctio_sent=1/1, RecvExAddr=0x12bae0, OX_ID=0x27d, RX_ID=0xffff, SID=0x45400)"/>

</LR>

Feb  9 16:24:21 caustdsh0610 vmkernel: 0:22:13:51.745 cpu15:6028)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a029e00) to NMP device "naa.60a9800057

2d4c73614a552f43546149" failed on physical path "vmhba2:C0:T0:L5" H:0x2 D:0x8 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:21 caustdsh0610 vmkernel: 0:22:13:51.745 cpu15:6028)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60a98000572d4c73614a552f4354

6149" state in doubt; requested fast path state update...

Feb  9 16:24:21 caustdsh0610 vmkernel: 0:22:13:51.745 cpu15:6028)ScsiDeviceIO: 747: Command 0x2a to device "naa.60a98000572d4c73614a552f43546149" failed H:0x

2 D:0x8 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:21 caustdsh0610 vmkernel: 0:22:13:51.746 cpu15:6028)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a0e9840) to NMP device "naa.60a9800057

2d4c73614a552f43554644" failed on physical path "vmhba2:C0:T2:L6" H:0x2 D:0x8 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:21 caustdsh0610 vmkernel: 0:22:13:51.746 cpu15:6028)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60a98000572d4c73614a552f4355

4644" state in doubt; requested fast path state update...

Feb  9 16:24:21 caustdsh0610 vmkernel: 0:22:13:51.746 cpu15:6028)ScsiDeviceIO: 747: Command 0x2a to device "naa.60a98000572d4c73614a552f43554644" failed H:0x

2 D:0x8 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:21 caustdsh0610 vmkernel: 0:22:13:51.746 cpu15:6028)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x41000a0abe40) to NMP device "naa.60a9800057

2d4c73614a552f43554644" failed on physical path "vmhba2:C0:T2:L6" H:0x2 D:0x8 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:21 caustdsh0610 vmkernel: 0:22:13:51.746 cpu15:6028)ScsiDeviceIO: 747: Command 0x28 to device "naa.60a98000572d4c73614a552f43554644" failed H:0x

2 D:0x8 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:21 caustdsh0610 vmkernel: 0:22:13:51.746 cpu15:6028)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a15d5c0) to NMP device "naa.60a9800057

2d4c73614a552f43554644" failed on physical path "vmhba2:C0:T2:L6" H:0x2 D:0x8 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:21 caustdsh0610 vmkernel: 0:22:13:51.746 cpu15:6028)ScsiDeviceIO: 747: Command 0x2a to device "naa.60a98000572d4c73614a552f43554644" failed H:0x

2 D:0x8 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:21 caustdsh0610 vmkernel: 0:22:13:51.746 cpu15:6028)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a0e1780) to NMP device "naa.60a9800057

2d4c73614a552f43554644" failed on physical path "vmhba2:C0:T2:L6" H:0x2 D:0x8 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:21 caustdsh0610 vmkernel: 0:22:13:51.746 cpu15:6028)ScsiDeviceIO: 747: Command 0x2a to device "naa.60a98000572d4c73614a552f43554644" failed H:0x

2 D:0x8 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:21 caustdsh0610 vmkernel: 0:22:13:51.746 cpu15:6028)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a23bcc0) to NMP device "naa.60a9800057

2d4c73614a552f43554644" failed on physical path "vmhba2:C0:T2:L6" H:0x2 D:0x8 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:21 caustdsh0610 vmkernel: 0:22:13:51.746 cpu15:6028)ScsiDeviceIO: 747: Command 0x2a to device "naa.60a98000572d4c73614a552f43554644" failed H:0x

2 D:0x8 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:27 caustdsh0610 vmkernel: 0:22:13:58.104 cpu15:4228)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a15a6c0) to NMP device "naa.60a9800057

2d4c74564a552f44394641" failed on physical path "vmhba2:C0:T3:L12" H:0x5 D:0x0 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:27 caustdsh0610 vmkernel: 0:22:13:58.104 cpu15:4228)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60a98000572d4c74564a552f4439

4641" state in doubt; requested fast path state update...

Feb  9 16:24:27 caustdsh0610 vmkernel: 0:22:13:58.104 cpu15:4228)ScsiDeviceIO: 747: Command 0x2a to device "naa.60a98000572d4c74564a552f44394641" failed H:0x

5 D:0x0 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:27 caustdsh0610 vmkernel: 0:22:13:58.104 cpu15:4228)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a1f4600) to NMP device "naa.60a9800057

2d4c73614a552f43546149" failed on physical path "vmhba2:C0:T0:L5" H:0x5 D:0x0 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:27 caustdsh0610 vmkernel: 0:22:13:58.104 cpu15:4228)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60a98000572d4c73614a552f4354

6149" state in doubt; requested fast path state update...

Feb  9 16:24:27 caustdsh0610 vmkernel: 0:22:13:58.104 cpu15:4228)ScsiDeviceIO: 747: Command 0x2a to device "naa.60a98000572d4c73614a552f43546149" failed H:0x

5 D:0x0 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:27 caustdsh0610 vmkernel: 0:22:13:58.104 cpu15:4228)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a1ff280) to NMP device "naa.60a9800057

2d4c74564a552f44386444" failed on physical path "vmhba2:C0:T1:L11" H:0x5 D:0x0 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:27 caustdsh0610 vmkernel: 0:22:13:58.104 cpu15:4228)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60a98000572d4c74564a552f4438

6444" state in doubt; requested fast path state update...

Feb  9 16:24:27 caustdsh0610 vmkernel: 0:22:13:58.104 cpu15:4228)ScsiDeviceIO: 747: Command 0x2a to device "naa.60a98000572d4c74564a552f44386444" failed H:0x

5 D:0x0 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:27 caustdsh0610 vmkernel: 0:22:13:58.104 cpu15:4228)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a1f5800) to NMP device "naa.60a9800057

2d4c73614a552f43554644" failed on physical path "vmhba2:C0:T2:L6" H:0x5 D:0x0 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:27 caustdsh0610 vmkernel: 0:22:13:58.104 cpu15:4228)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60a98000572d4c73614a552f4355

4644" state in doubt; requested fast path state update...

Feb  9 16:24:27 caustdsh0610 vmkernel: 0:22:13:58.104 cpu15:4228)ScsiDeviceIO: 747: Command 0x2a to device "naa.60a98000572d4c73614a552f43554644" failed H:0x

5 D:0x0 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:27 caustdsh0610 vmkernel: 0:22:13:58.104 cpu15:4228)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a0bff40) to NMP device "naa.60a9800057

2d4c74564a552f442d3848" failed on physical path "vmhba2:C0:T1:L13" H:0x5 D:0x0 P:0x0 Possible sense data&colon; 0x2 0x3a 0x1.

Feb  9 16:24:27 caustdsh0610 vmkernel: 0:22:13:58.104 cpu15:4228)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60a98000572d4c74564a552f442d

3848" state in doubt; requested fast path state update...

Feb  9 16:24:27 caustdsh0610 vmkernel: 0:22:13:58.104 cpu15:4228)ScsiDeviceIO: 747: Command 0x2a to device "naa.60a98000572d4c74564a552f442d3848" failed H:0x

5 D:0x0 P:0x0 Possible sense data&colon; 0x2 0x3a 0x1.

Feb  9 16:24:31 caustdsh0610 vmkernel: 0:22:14:01.746 cpu4:4232)<3> rport-3:0-2: blocked FC remote port time out: saving binding

Feb  9 16:24:31 caustdsh0610 vmkernel: 0:22:14:01.746 cpu4:4232)<3> rport-3:0-4: blocked FC remote port time out: saving binding

Feb  9 16:24:31 caustdsh0610 vmkernel: 0:22:14:01.746 cpu4:4232)<3> rport-3:0-5: blocked FC remote port time out: saving binding

Feb  9 16:24:31 caustdsh0610 vmkernel: 0:22:14:01.746 cpu12:4251)<3>lpfc820 0000:49:00.0: 0Smiley Sad0):0203 Devloss timeout on WWPN 50:0a:09:84:99:6b:32:0b NPort x0

49600 Data&colon; x0 x7 x0

Feb  9 16:24:31 caustdsh0610 vmkernel: 0:22:14:01.746 cpu12:4251)<3>lpfc820 0000:49:00.0: 0Smiley Sad0):0203 Devloss timeout on WWPN 50:0a:09:82:99:6b:32:0b NPort x0

4fc00 Data&colon; x0 x7 x0

Feb  9 16:24:31 caustdsh0610 vmkernel: 0:22:14:01.746 cpu12:4251)<3>lpfc820 0000:49:00.0: 0Smiley Sad0):0203 Devloss timeout on WWPN 50:0a:09:82:89:6b:32:0b NPort x0

4ec00 Data&colon; x0 x7 x0

Feb  9 16:24:31 caustdsh0610 vmkernel: 0:22:14:01.747 cpu7:4103)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a0e9840) to NMP device "naa.60a98000572

d4c73614a552f43554644" failed on physical path "vmhba2:C0:T2:L6" H:0x1 D:0x0 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:31 caustdsh0610 vmkernel: 0:22:14:01.747 cpu7:4103)WARNING: NMP: nmp_DeviceRetryCommand: Device "naa.60a98000572d4c73614a552f43554644": awaiting

fast path state update for failover with I/O blocked. No prior reservation exists on the device.

Feb  9 16:24:31 caustdsh0610 vmkernel: 0:22:14:01.747 cpu7:4103)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a029e00) to NMP device "naa.60a98000572

d4c73614a552f43546149" failed on physical path "vmhba2:C0:T0:L5" H:0x1 D:0x0 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:31 caustdsh0610 vmkernel: 0:22:14:01.747 cpu7:4103)WARNING: NMP: nmp_DeviceRetryCommand: Device "naa.60a98000572d4c73614a552f43546149": awaiting

fast path state update for failover with I/O blocked. No prior reservation exists on the device.

Feb  9 16:24:31 caustdsh0610 vmkernel: 0:22:14:01.747 cpu7:4103)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x41000a0abe40) to NMP device "naa.60a98000572

d4c73614a552f43554644" failed on physical path "vmhba2:C0:T2:L6" H:0x1 D:0x0 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:31 caustdsh0610 vmkernel: 0:22:14:01.747 cpu7:4103)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a15d5c0) to NMP device "naa.60a98000572

d4c73614a552f43554644" failed on physical path "vmhba2:C0:T2:L6" H:0x1 D:0x0 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:31 caustdsh0610 vmkernel: 0:22:14:01.747 cpu7:4103)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a0e1780) to NMP device "naa.60a98000572

d4c73614a552f43554644" failed on physical path "vmhba2:C0:T2:L6" H:0x1 D:0x0 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Feb  9 16:24:31 caustdsh0610 vmkernel: 0:22:14:01.747 cpu7:4103)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000a23bcc0) to NMP device "naa.60a98000572

Feb  9 16:26:18 caustdsh0610 vmkernel: 0:22:15:49.151 cpu14:4228)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60a98000572d4c73614a552f4344

7869" state in doubt; requested fast path state update...

Feb  9 16:26:21 caustdsh0610 vmkernel: 0:22:15:52.351 cpu12:4231)<3> rport-4:0-4: blocked FC remote port time out: saving binding

Feb  9 16:26:21 caustdsh0610 vmkernel: 0:22:15:52.351 cpu6:4236)<3> rport-4:0-3: blocked FC remote port time out: saving binding

Feb  9 16:26:21 caustdsh0610 vmkernel: 0:22:15:52.351 cpu1:4235)<3> rport-4:0-5: blocked FC remote port time out: saving binding

Re: ESX 4.0 hosts causing SCSI Hung Commands on Adapter

Try using VMW_PSP_MRU for path selection.

Because in ESX4.x if you list the satp rules. It says

Name                                  Vendor   Model                               Driver  Transport  Options  Claim Options    Description

VMW_SATP_ALUA    NETAPP                               tpgs_on                                                 NetApp with ALUA

And then if you check the default psp for VMW_SATP_ALUA  satp it is.


Name                             Default PSP            Description

VMW_SATP_ALUA        VMW_PSP_MRU    Supports non-specific arrays that use the ALUA protocol

Thanks,

-A William

Re: ESX 4.0 hosts causing SCSI Hung Commands on Adapter

We encounter the same problems in our environment.

Two FAS3040 active/active and out ESX Hosts show theses messages throughout the day. Despite you we use QLOGIC HBA.

Are those messages critical, or can we ignore these?

Whether we use VMW_PSP_MRU or VMW_PSP_RR does not make any difference. Should we open a ticket for this at VMware or is this still a NetApp Problem ?

Re: ESX 4.0 hosts causing SCSI Hung Commands on Adapter

There's a good chance you'll want to update to 7.3.2P6....especially if what you're running into is related to this bug.

http://now.netapp.com/NOW/cgi-bin/bol?Type=Plain&Display=387313

Re: ESX 4.0 hosts causing SCSI Hung Commands on Adapter

I don't see how the fix applies.  Seems specific to a 31xx and I have a similar issue with 6040.

Re: ESX 4.0 hosts causing SCSI Hung Commands on Adapter

James,

Did you ever get a fix for this issue, we are doing almost the same migration process from HP EVA to V-Series3240's but our ONTAP version is 8.0.1P3

At the moment write I/O to the NetApp doesn't happen and we are seeing the same sense code messages" failed H:0x2 D:0x0 P:0x0 Possible sense data&colon; 0x0 0x0 0x0"

Many thanks

Re: ESX 4.0 hosts causing SCSI Hung Commands on Adapter

Hello,

Did anyone get a fix for this issue?

Regards,

Pedro Rocha.

Re: ESX 4.0 hosts causing SCSI Hung Commands on Adapter

Hey All,

I'm not sure I can "fix" all of these issues through the support forums, but I'll provide what info I can to help, and feel free to direct message me with your NetApp case numbers and we can go through the host and controller data to find the "real" RCA for all of this.

From your messages (vmkernel) logs in the first part of the post, you're receiving the following errors: failed on physical path "vmhba2:C0:T3:L11" H:0x7 D:0x0 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

Using the VMware NMP host side debug (because the error is on H: - host, not D: - device): http://kb.vmware.com/kb/1029039 we can decode this to mean:

This status is returned when a device has been reset due to a Storage Initiator Error. This typically occurs due to an outdated Emulex HBA firmware or possibly (though rarely) as the result of a bad HBA. 

For more information on the bad HBA scenario, see One host shows a Storage Initiator Error while all other hosts show SCSI Reservation Conflicts (1021187).

In addition to a bad HBA possibility, when this is occurring on all paths on all hosts, I'd recommend verifying the firmware on the interconnects and switches connecting the hosts or blades to the fabric.  We have also seen this when there is a high fan in ratio from the blades through the interconnect (creating severe congestion in the interconnect), and when there is a bad connection or faulty SFP within the fabric or on an ISL.

From the second chunk of vmkernel logs, you're receiving: failed on physical path "vmhba2:C0:T2:L6" H:0x1 D:0x0 P:0x0 Possible sense data&colon; 0x0 0x0 0x0.

You can use the same VMware KB above to diagnose this, and find it is pointing to "No Connect", which we typically see in the event that you still have a phyiscal connection to the fabric, but the controller isn't visible to the host because of dropped frames or congestion issues.  If this is happening on more than one host, it is likely there is a fabric communication problem (likely due to a firmware compatibility issue).