Unusual error (scsi.cmd.checkCondition:Unknown device)

NetApp_Journeyman · ‎2020-05-08

Hello

This morning we were being inundated with an unusual error message:

Event:

scsi.cmd.checkCondition: Unknown device essf1bsan:9-5.159: Check Condition: CDB 0xa0: Sense Data SCSI:illegal request - (0x5 - 0x25 0x0 0x0)(0).

Message Name:

scsi.cmd.checkCondition

Sequence Number:

1703416

Description:

A Check Condition is the mechanism whereby a target device reports an informational condition or error status to the requesting host. Either the condition that generated this event is an error that occurred during execution of the command and was not cleared by retrying the request or an informational condition reporting status of the present operation or media state.

Action:

A target status of Check Condition normally indicates an error at the device during execution of the requested command. Such cases are often the result of an intermittent device hardware or firmware problem that is automatically handled by the Data ONTAP drivers through command retries. In cases of repeated events, the specified device should be evaluated for proper operation and possible repair or replacement. A Check Condition with a SenseKey of "no sense" indicates an informational condition that is automatically handled by the Data ONTAP driver associated with the target device. These are normally not reported as error events. However, on occasion an unexpected informational condition may be reported by a target device. These cases should not be interpreted as a failure of the target device.

This issue has persisted for 2 days now. What should we do?

Ontapforrum · ‎2020-05-08

Hi,

Usually for such errors, scsi sense code keys interpretation is helpful.

The SCSI Sense codes follow an industry standard maintained by Technical Committee T10
http://www.t10.org/lists/2asc.htm for details.

Following interpretation is given for the 0x5 0x25 0x0 field:
sense=5 (ILLEGAL REQUEST), ASC=25 ASCQ=0 (LOGICAL UNIT NOT SUPPORTED)

Again according to t10: These messages indicate that the server successfully submitted the IO to the target, but the target REJECTED IO request with an error message.

Areas to look at: This may indicate that the LUNS have been unpresented from the storage system, but the server was not aware of this change.

I think this is the device for you to investigate 'essf1bsan', if the scsi path from host to target is in place and no changes are made.

Did anyone make any changes to the igroup or tried to unmap the lun while host serving IOs, just follow the basic troubleshooting steps.

Thanks!

NetApp_Journeyman · ‎2020-05-11

Thank you for such a detailed answer

>Areas to look at: This may indicate that the LUNS have been unpresented from the storage system, but the server was not aware of this change.

Unfortunately, the only thing that comes to mind regarding the issue was when we recently unassigned disks and had them reclaimed by another team. Could that be causing the issue? essf1bsan appears to be a disk.

Unfortunately, the issue appears to be ongoing and hasn't abated in the slightest. We're still getting hammered by the alert. I've created a case with NetApp but haven't gotten a response. We're still not sure how to proceed from here

Any help would be appreciated

NetApp_Journeyman · ‎2020-05-11

Having checked again, the "unknown device" being represented in the alert is different every time. Something to consider I suppose

Ontapforrum · ‎2020-05-11

ok. If errors are being still reported that means, there is a SAN host(s) that are trying IO with NetApp but aren't communicating well.

I don't remember if you had mentioned ontap version /Model ? I am guessing 'cdot'.

Storage will complain if the 'IOs' are being sent down to storage from Hosts but somehow it cannot make sense of the scsi request.

Could be the igroup (Which maps LUN to the Host] : Still has the 'WWPN/IQN' mapped to the old luns ?

::> lun show -mapped mapped -state offline

and

::> lun show -mapped unmapped

Can you share this ?

::> event log show -message-name scsi.cmd.checkCondition -instance

Also, in the mean time raise a ticket with NetApp support .

NetApp_Journeyman · ‎2020-05-11

We're ONTAP version 9.3P3, cluster yes

::> lun show -mapped mapped -state offline

and

::> lun show -mapped unmapped

Running both of these commands gives me the "there are no entries matching your query"

Can you share this ?

::> event log show -message-name scsi.cmd.checkCondition -instance

Glady:

stcffn::> event log show -node * -message-name scsi.cmd.checkCondition -instance

Node: stcffn-02
Sequence#: 1327706
Time: 5/11/2020 11:18:38
Severity: ERROR
Source: isp2400_intrd
Message Name: scsi.cmd.checkCondition
Event: scsi.cmd.checkCondition: Unknown device essf1bsan:8-8.15 9: Check Condition: CDB 0xa0: Sense Data SCSI:illegal request - (0x5 - 0x25 0x0 0x0)(0).
Corrective Action: A target status of Check Condition normally indicates an error at the device during execution of the requested command. Such cases are o ften the result of an intermittent device hardware or firmware problem that is a utomatically handled by the Data ONTAP drivers through command retries. In cases of repeated events, the specified device should be evaluated for proper operati on and possible repair or replacement. A Check Condition with a SenseKey of "no sense" indicates an informational condition that is automatically handled by the Data ONTAP driver associated with the target device. These are normally not rep orted as error events. However, on occasion an unexpected informational conditio n may be reported by a target device. These cases should not be interpreted as a failure of the target device.
Description: A Check Condition is the mechanism whereby a target devi ce reports an informational condition or error status to the requesting host. Ei ther the condition that generated this event is an error that occurred during ex ecution of the command and was not cleared by retrying the request or an informa tional condition reporting status of the present operation or media state.

Node: stcffn-01
Sequence#: 1714817
Time: 5/11/2020 11:18:01
Severity: ERROR
Source: isp2400_intrd
Message Name: scsi.cmd.checkCondition
Event: scsi.cmd.checkCondition: Unknown device essf1bsan:9-5.15 9: Check Condition: CDB 0xa0: Sense Data SCSI:illegal request - (0x5 - 0x25 0x0 0x0)(0).
Corrective Action: A target status of Check Condition normally indicates an error at the device during execution of the requested command. Such cases are o ften the result of an intermittent device hardware or firmware problem that is a utomatically handled by the Data ONTAP drivers through command retries. In cases of repeated events, the specified device should be evaluated for proper operati on and possible repair or replacement. A Check Condition with a SenseKey of "no sense" indicates an informational condition that is automatically handled by the Data ONTAP driver associated with the target device. These are normally not rep orted as error events. However, on occasion an unexpected informational conditio n may be reported by a target device. These cases should not be interpreted as a failure of the target device.

I've opened up a case with NetApp already.

paul_stejskal · ‎2020-05-11

I'd open a ticket with Hitachi.

Ontapforrum · ‎2020-05-11

Ok. I think, as a next step, we can atleast rule out old firmware (on disks) related stuff.

In the mean time, could you check the firmware on the nodes where the errors are reported.

::> storage disk show -fields firmware-revision,model

Compare the same with the latest fw for the listed disk models here:
https://mysupport.netapp.com/site/downloads/firmware/disk-drive-firmware

If newer are available, then go for the update.

As you already raised a ticket, so I believe NetApp will soon guide on this.

Thanks!

paul_stejskal · ‎2020-05-11

I checked ASUP, those essf1bsan:8-8.159 drives are Hitachi FlexArray backend. Disk firmware won't matter here (though it should be up to date anyway).

NetApp_Journeyman · ‎2020-05-12

Thank you

Does anyone know the quickest way of getting in touch with Hitachi and resolving this issue? I plan on calling their number, but if there was anyone I could email, I would prefer that

Ontapforrum · ‎2020-05-12

I believe this is what google shows:

https://support.hitachivantara.com/en/contact-support.html
Email: [email protected]

NetApp_Journeyman · ‎2020-05-15

Just thought i'd let you know that the issue has been resolved. Deleting the zones seems to have done the trick.

Thank you for your support!