Subscribe

A sensor reported a fault on 2d shelf 11 (with error: 20)

Hi,

 

We are noticing an error on SAS Shelf for bay 20, the disk is missing.

 

Shelf Status:

 

Loop 2d announced error in header
A sensor reported a fault on 2d shelf 11 (with error: 20)
Environment for channel 2d
	Number of shelves monitored: 1	enabled: yes
	Environmental failure on shelves on this channel? yes
Shelf bays with disk devices installed:
	  23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
	  with error: 20

We see the following log events

 

 params: {'debug_string': 'BAD/WRONG destination on OPEN (0x17) -- delaying: dev 2b.11.20, cdb 0x2f:40931580:0480 (0/0/23521291), NDU 0x0', 'adapterName': '2a'}
[?] Sat Aug 05 02:41:25 IST [napmaidp01: pmcsas_asyncd_0: sas.adapter.debug:info]: params: {'debug_string': 'Device 2d.11.20 invalidate debounce - 40', 'adapterName': '2c'}
[?] Sat Aug 05 02:41:25 IST [napmaidp01: pmcsas_asyncd_0: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done -- SATA 0/0, SATA reserved 0/0.', 'adapterName': '2c'}
[?] Sat Aug 05 02:41:25 IST [napmaidp01: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Device 2b.11.20 invalidate debounce - 40', 'adapterName': '2a'}
[?] Sat Aug 05 02:41:25 IST [napmaidp01: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done -- SATA 0/0, SATA reserved 0/0.', 'adapterName': '2a'}
[?] Sat Aug 05 02:41:45 IST [napmaidp01: pmcsas_timeout_0: sas.adapter.debug:info]: params: {'debug_string': 'Device 2d.11.20 is present and powered up but is about to be invalidated (20) -- power cycling.', 'adapterName': '2c'}
[?] Sat Aug 05 02:41:45 IST [napmaidp01: pmcsas_asyncd_0: sas.adapter.debug:info]: params: {'debug_string': 'Starting powercycle on device 2d.11.20', 'adapterName': '2c'}
[?] Sat Aug 05 02:41:45 IST [napmaidp01: pmcsas_asyncd_0: sas.adapter.debug:info]: params: {'debug_string': 'PHY POWER CYCLE already in progress (WWN 5:0050cc:103378c:3f, phy 28) -- aborting', 'adapterName': '2c'}
[?] Sat Aug 05 02:41:45 IST [napmaidp01: pmcsas_asyncd_0: sas.adapter.debug:info]: params: {'debug_string': 'Powercycle on device 2d.11.20 complete: status 0', 'adapterName': '2c'}
[?] Sat Aug 05 02:41:45 IST [napmaidp01: pmcsas_timeout_1: sas.adapter.debug:info]: params: {'debug_string': 'Device 2b.11.20 is present and powered up but is about to be invalidated (20) -- power cycling.', 'adapterName': '2a'}
[?] Sat Aug 05 02:41:45 IST [napmaidp01: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Starting powercycle on device 2b.11.20', 'adapterName': '2a'}
[?] Sat Aug 05 02:41:45 IST [napmaidp01: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'PHY POWER CYCLE already in progress (WWN 5:0050cc:103379e:3f, phy 28) -- aborting', 'adapterName': '2a'}
[?] Sat Aug 05 02:41:45 IST [napmaidp01: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Powercycle on device 2b.11.20 complete: status 0', 'adapterName': '2a'}
[?] Sat Aug 05 02:41:47 IST [napmaidp01: ses_admin: ses.status.driveWarning:debug]: A non-critical event has been detected on drive 20 on DS4243 shelf 2d.11; non-critical.
[?] Sat Aug 05 02:41:57 IST [napmaidp01: ses_admin: ses.status.driveOk:info]: The error on drive 20 on DS4243 shelf 2d.11 has been corrected.
[?] Sat Aug 05 02:42:05 IST [napmaidp01: pmcsas_timeout_0: sas.adapter.debug:info]: params: {'debug_string': 'Device 2d.11.20 invalidated.', 'adapterName': '2c'}
[?] Sat Aug 05 02:42:05 IST [napmaidp01: pmcsas_timeout_0: sas.adapter.debug:info]: params: {'debug_string': 'Invalidating device 2d.11.20. ', 'adapterName': '2c'}
[?] Sat Aug 05 02:42:05 IST [napmaidp01: pmcsas_timeout_0: scsi.cmd.selectionTimeout:error]: Disk device 2d.11.20: Adapter/target error: HA status 0x7: cdb 0x2f:40931580:0480. Targeted device did not respond to requested I/O. I/O will be retried.
[?] Sat Aug 05 02:42:05 IST [napmaidp01: pmcsas_timeout_1: sas.adapter.debug:info]: params: {'debug_string': 'Device 2b.11.20 invalidated.', 'adapterName': '2a'}
[?] Sat Aug 05 02:42:05 IST [napmaidp01: pmcsas_timeout_1: sas.adapter.debug:info]: params: {'debug_string': 'Invalidating device 2b.11.20. ', 'adapterName': '2a'}
[?] Sat Aug 05 02:42:05 IST [napmaidp01: pmcsas_timeout_1: scsi.cmd.selectionTimeout:error]: Disk device 2b.11.20: Adapter/target error: HA status 0x7: cdb 0x2f:40931580:0480. Targeted device did not respond to requested I/O. I/O will be retried.
[?] Sat Aug 05 02:42:05 IST [napmaidp01: pmcsas_timeout_1: disk.ioFailed:error]: params: {'deviceName': '2b.11.20', 'returnCode': '2', 'pathRetryCount': '0', 'adapterStatus': '0xd', 'cdb': '0x5e:01', 'basicTimeout': '10', 'iASCQ': '0x0', 'iSenseKey': '0x0', 'sSenseCode': '', 'ETime': '127', 'iASC': '0x0', 'victimRetryCount': '0', 'sSenseKey': 'SCSI:no sense', 'targetStatus': '0x0', 'retryCount': '0', 'pathsTried': '0', 'timeoutRetryCount': '0'}
[?] Sat Aug 05 02:42:05 IST [napmaidp01: pmcsas_timeout_1: ems.engine.event.tooBig:warning]: params: {'lastSize': '1128', 'averageSize': '1122', 'emsId': 'disk.ioFailed'}
[?] Sat Aug 05 02:42:05 IST [napmaidp01: pmcsas_timeout_1: disk.reserveFailed:error]: Disk reservation failed on 2b.11.20 CDB 0x5e:01 - SCSI:no sense (0 0 0)
[?] Sat Aug 05 02:42:05 IST [napmaidp01: pmcsas_send_admin: scsi.cmd.selectionTimeout:error]: Disk device ?:?.?: Adapter/target error: HA status 0x7: cdb 0x2f:40931580:0480. Targeted device did not respond to requested I/O. I/O will be retried.
[?] Sat Aug 05 02:42:05 IST [napmaidp01: pmcsas_send_admin: disk.ioFailed:error]: params: {'deviceName': '2b.11.20', 'returnCode': '2', 'pathRetryCount': '0', 'adapterStatus': '0xd', 'cdb': '0x2f:40931580:0480', 'basicTimeout': '10', 'iASCQ': '0x0', 'iSenseKey': '0x0', 'sSenseCode': '', 'ETime': '41123', 'iASC': '0x0', 'victimRetryCount': '79', 'sSenseKey': 'SCSI:no sense', 'targetStatus': '0x0', 'retryCount': '2', 'pathsTried': '1', 'timeoutRetryCount': '0'}
[?] Sat Aug 05 02:42:05 IST [napmaidp01: pmcsas_send_admin: ems.engine.event.tooBig:warning]: params: {'lastSize': '1144', 'averageSize': '1122', 'emsId': 'disk.ioFailed'}
[?] Sat Aug 05 02:42:05 IST [napmaidp01: sanown_io: diskown.errorDuringIO:error]: error 25 (no valid path to disk) on disk 2b.11.20 (S/N WD-WCAW31217364) while setting disk reservation
[?] Sat Aug 05 02:42:05 IST [napmaidp01: raidio_thread: raid.spares.media_scrub.suspend:notice]: params: {'disk_rpm': '7200', 'vendor': 'NETAPP  ', 'dbn': '120375552', 'firmware_revision': 'NA04', 'shelf': '11', 'disk_info': 'Disk 2b.11.20 Shelf 11 Bay 20 [NETAPP   X302_WVULC01TSSM NA04] S/N [WD-WCAW31217364]', 'bay': '20', 'serialno': 'WD-WCAW31217364', 'owner': '', 'percentage': '55', 'disk_type': '8', 'model': 'X302_WVULC01TSSM'}
[?] Sat Aug 05 02:42:05 IST [napmaidp01: config_thread: raid.config.spare.disk.missing:info]: Spare Disk 2b.11.20 Shelf 11 Bay 20 [NETAPP   X302_WVULC01TSSM NA04] S/N [WD-WCAW31217364] is missing.
[?] Sat Aug 05 02:42:12 IST [napmaidp01: pmcsas_timeout_0: sas.adapter.debug:info]: params: {'debug_string': 'phy 28 on expander 5:0050cc:103378c:3f is in state 0 but dongle is present and powered up -- initiating PHY reset.', 'adapterName': '2c'}
[?] Sat Aug 05 02:42:12 IST [napmaidp01: pmcsas_timeout_0: sas.adapter.debug:info]: params: {'debug_string': 'One or more (1) PHYs on expander 5:0050cc:103378c:3f are in a bad state.', 'adapterName': '2c'}
[?] Sat Aug 05 02:42:12 IST [napmaidp01: pmcsas_timeout_1: sas.adapter.debug:info]: params: {'debug_string': 'phy 28 on expander 5:0050cc:103379e:3f is in state 0 but dongle is present and powered up -- initiating PHY reset.', 'adapterName': '2a'}
[?] Sat Aug 05 02:42:12 IST [napmaidp01: pmcsas_timeout_1: sas.adapter.debug:info]: params: {'debug_string': 'One or more (1) PHYs on expander 5:0050cc:103379e:3f are in a bad state.', 'adapterName': '2a'}
[?] Sat Aug 05 02:42:15 IST [napmaidp01: pmcsas_asyncd_0: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done -- SATA 0/0, SATA reserved 0/0.', 'adapterName': '2c'}
[?] Sat Aug 05 02:42:15 IST [napmaidp01: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done -- SATA 0/0, SATA reserved 0/0.', 'adapterName': '2a'}
[?] Sat Aug 05 02:42:20 IST [napmaidp01: pmcsas_timeout_0: sas.adapter.debug:info]: params: {'debug_string': 'phy 28 on expander 5:0050cc:103378c:3f is in state 0 but dongle is present and powered up -- initiating powercycle.', 'adapterName': '2c'}
[?] Sat Aug 05 02:42:20 IST [napmaidp01: pmcsas_timeout_0: sas.adapter.debug:info]: params: {'debug_string': 'One or more (1) PHYs on expander 5:0050cc:103378c:3f are in a bad state.', 'adapterName': '2c'}
[?] Sat Aug 05 02:42:20 IST [napmaidp01: pmcsas_asyncd_0: sas.adapter.debug:info]: params: {'debug_string': 'PHY POWER CYCLE already in progress (WWN 5:0050cc:103378c:3f, phy 28) -- aborting', 'adapterName': '2c'}
[?] Sat Aug 05 02:42:20 IST [napmaidp01: pmcsas_timeout_1: sas.adapter.debug:info]: params: {'debug_string': 'phy 28 on expander 5:0050cc:103379e:3f is in state 0 but dongle is present and powered up -- initiating powercycle.', 'adapterName': '2a'}
[?] Sat Aug 05 02:42:20 IST [napmaidp01: pmcsas_timeout_1: sas.adapter.debug:info]: params: {'debug_string': 'One or more (1) PHYs on expander 5:0050cc:103379e:3f are in a bad state.', 'adapterName': '2a'}
[?] Sat Aug 05 02:42:20 IST [napmaidp01: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'PHY POWER CYCLE already in progress (WWN 5:0050cc:103379e:3f, phy 28) -- aborting', 'adapterName': '2a'}
[?] Sat Aug 05 02:42:28 IST [napmaidp01: pmcsas_timeout_1: sas.adapter.debug:info]: params: {'debug_string': 'One or more (1) PHYs on expander 5:0050cc:103379e:3f are in a bad state.', 'adapterName': '2a'}
[?] Sat Aug 05 02:42:45 IST [napmaidp01: emslog_main: ems.log.duplicate:info]: params: {'id': '1478361542/513545', 'numDups': '1'}
[?] Sat Aug 05 02:42:52 IST [napmaidp01: asup_main: cmds.sysconf.validDebug:debug]: sysconfig: Validating configuration.
[?] Sat Aug 05 02:43:40 IST [napmaidp01: ses_admin: ses.status.driveWarning:debug]: A non-critical event has been detected on drive 20 on DS4243 shelf 2d.11; non-critical.
[?] Sat Aug 05 02:45:58 IST [napmaidp01: asup_main: api.fileio:warning]: params: {'errorDescription': 'Removing iterator file that is too old', 'errorCode': '22', 'errorDetail': 'Invalid argument', 'targetDescription': '/etc/.zapi/56442973935054481.next'}

disk is of model X302_WVULC01TSSM and DQP is of datecode 20160701.

 

Any inputs on the above will be a great help for us.