Solved: "Task Set Aborted by the initiator at port Id"

MarvinN · ‎2016-10-14

Hello experts, I have a metro cluster FAS3240 7-mode extended between sites, recently we are experiencing a "PATH HAS FAILED" error "PATH HAS RECOVERED" logged on two VIOS (IBM Power VM) servers connected to FAS3240A (filler in primary site) by the time error is present our app core response goes down its performance for a few seconds.

rd files /etc/messages on FAS3240A does not show errors, but rdfile /etc/messages on FAS3240B (secundary site) register the following error.

Thu Oct 13 13:05:57 CST [FAS3240B:scsitarget.ispfct.abortTaskSet:debug]: FCP Target 2a: LUN 18, Task Set Aborted by the Initiator at Port Id: 0x51800 (WWPN 10000090fa07e0e8)

Thu Oct 13 13:06:04 CST [FAS3240B:scsitarget.ispfct.abortTaskSet:debug]: FCP Target 2a: LUN 19, Task Set Aborted by the Initiator at Port Id: 0x50800 (WWPN 10000090fa0817d0)

I can see throu fcp show initiator that both WWPNs correspond to the hba´s for servers on primary site (VIOS) as you can see bellow

10:00:00:90:fa:08:17:d0 SRV_APP_hba1 == 10000090fa0817d0

WWPN Alias(es): SRV_APP_hba1

10:00:00:90:fa:07:e0:e8 SRV_DB_hba1 ==10000090fa07e0e8

WWPN Alias(es): SRV_DB_hba1

Any idea how to troubleshoot this, thanks you.

MarvinN · ‎2016-11-29

Finally we fixed the problem.

We found that frontend Brocade Switches (where servers are connected) had an ISL each one both primary site and remote site. That ISL configuration were set incorrectly with short distances and ports were not long distance as Backend switches.

So using brocade command line portisllongdistance 4 LS 1 33, I set ISL ports as LS and distance to 33KM (to do this was necessary ask brocade support for Extended fabric licenses and upload to frontend switches.

since we made that changes on our 4 backend switches error stops logging and performance gone fine.

I hope this solutions can help others.

View solution in original post

Jeff_Yao · ‎2016-10-17

not so sure about the issue, but probably need to start with the connections. like check logs on switches or any devices on the connections between 2 sites.

or any changes recently which could affected?

MarvinN · ‎2016-10-17

Thank you Yaoguang.

We already checked connections between sites, brocades seems to be without errors. It seems to be something between servers and filers, we already have a ticket with technical support from netapp but is taking more time than expected.

xiawiz · ‎2016-11-22

Any update on this ??????????

Looks like we just had a very similar event ....Many entries ( did not show them all ) paths down on the IBM VIO and server side - event lasted about 2 min

paths failed then paths recovered.

I am going to point to the NetApp disk failure and health trigger as the culprit.

-- from the Web "aborted task set or Task set aborted" discusses SCSI protocal methodologies and whatnot

http://www.pdl.cmu.edu/mailinglists/ips/mail/msg08086.html

Tue Nov 22 09:29:09 EST [filer02: scsitarget.ispfct.abortTaskSet:notice]: FCP Target 0a: LUN 21, Task Set Aborted by the Initiator at Port Id: 0x7eff00 (WWPN 10000090fa0ba72a)
Tue Nov 22 09:29:09 EST [filer02: scsitarget.ispfct.abortTaskSet:notice]: FCP Target 0a: LUN 22, Task Set Aborted by the Initiator at Port Id: 0x7eff00 (WWPN 10000090fa0ba72a)
Tue Nov 22 09:29:09 EST [filer02: scsitarget.ispfct.abortTaskSet:notice]: FCP Target 0a: LUN 30, Task Set Aborted by the Initiator at Port Id: 0x7eff00 (WWPN 10000090fa0ba72a)
Tue Nov 22 09:29:12 EST [filer02: scsitarget.ispfct.abortTaskSet:notice]: FCP Target 0d: LUN 22, Task Set Aborted by the Initiator at Port Id: 0x33ff00 (WWPN 10000090fa0ba612)
Tue Nov 22 09:32:17 EST [filer02: disk.healthTrigger:warning]: Disk 5b.32 received NHT health trigger (0x1 0xb 0x5d 0x10)

MarvinN · ‎2016-11-29

Finally we fixed the problem.

We found that frontend Brocade Switches (where servers are connected) had an ISL each one both primary site and remote site. That ISL configuration were set incorrectly with short distances and ports were not long distance as Backend switches.

So using brocade command line portisllongdistance 4 LS 1 33, I set ISL ports as LS and distance to 33KM (to do this was necessary ask brocade support for Extended fabric licenses and upload to frontend switches.

since we made that changes on our 4 backend switches error stops logging and performance gone fine.

I hope this solutions can help others.