About FAS and V-Series Storage Systems Discussions
Talk and ask questions about NetApp FAS series unified storage systems and V-Series storage virtualization controllers. Discuss with other members how to optimize these powerful data storage systems.
About FAS and V-Series Storage Systems Discussions
Talk and ask questions about NetApp FAS series unified storage systems and V-Series storage virtualization controllers. Discuss with other members how to optimize these powerful data storage systems.
Dear Community,
since we have updated some of our systems (FAS82xx, AFF A300) to Ontap 9.3P7,
we see the following Errors in Messages for our (UTA2) 10GBit LAN-Ports, which have not been here before the update.
10/16/2018 11:04:14 <Filer>-<node> ALERT vifmgr.cluscheck.crcerrors: Port e0g on node <Filer>-<Node> is reporting a high number of observed hardware errors, possibly CRC errors.
ifstat shows TotalErrors (increasing) and Errors/Minute but no CRC-Errors
-- interface e0g (1 hour, 42 minutes, 32 seconds) --
RECEIVE Total frames: 56798k | Frames/second: 9233 | Total bytes: 178g Bytes/second: 28949k | Total errors: 1337 | Errors/minute: 13 Total discards: 2 | Discards/minute: 0 | Multi/broadcast: 31503 Non-primary u/c: 0 | CRC errors: 0 | Runt frames: 18 Fragment: 0 | Long frames: 1319 | Alignment errors: 0 No buffer: 2 | Pause: 0 | Jumbo: 0 Noproto: 105 | Bus overruns: 0 | LRO segments: 50798k LRO bytes: 174g | LRO6 segments: 0 | LRO6 bytes: 0 Bad UDP cksum: 0 | Bad UDP6 cksum: 0 | Bad TCP cksum: 0 Bad TCP6 cksum: 0 | Mcast v6 solicit: 0 TRANSMIT Total frames: 16298k | Frames/second: 2649 | Total bytes: 11749m Bytes/second: 1909k | Total errors: 0 | Errors/minute: 0 Multi/broadcast: 605 | Pause: 0 | Jumbo: 6655k Cfg Up to Downs: 0 | TSO non-TCP drop: 0 | Split hdr drop: 0 Timeout: 0 | TSO segments: 840k | TSO bytes: 9910m TSO6 segments: 0 | TSO6 bytes: 0 | HW UDP cksums: 0 HW UDP6 cksums: 0 | HW TCP cksums: 0 | HW TCP6 cksums: 0 Mcast v6 solicit: 0 DEVICE Mcast addresses: 4 | Rx MBuf Sz: 4096 LINK INFO Speed: 10000m | Duplex: full | Flowcontrol: none Media state: active | Up to downs: 2
From my feeling it looks like a BUG in Data Ontap 9.3P7 (Error in their Portstats, ...), as we dont find any matching Errors in our Network infrastructure. Also no impact seen to the systems.
I already opened a support Case, but uptonow they cannot match this to an existing BUG, as 9.3P7 should have fixed all issues regarding this problem.
So timeconsuming debugging on customer site must be done to find the root-cause 😞
So the Question to the community: Anybody seen this Errors on Ontap 9.3P7?
Best Regards,
Klaus
... View more
Hello When I put Ownership in a new X477 that does not contain Ownership, the 7.2K disk appears to be 15K. Is there a way to check technical documents related to this phenomenon?
... View more
Hello We have FAS 2720 box with Base T ports, i would like to use Base T e0c/e0d/eoe/e0f ports for cluster interconnect , to use e0a/e0b SFP+ ports for data traffic. I see this is supported config on HWU, however during cluster setup creation it doesnt create cluster interface lif's and unbale to join the second nodes to switchless cluster thanks Ragesh
... View more
Hi, tonight one of our FAS (2554 20x4TB + 4x400GB SSD) send us an alert for SPARES_LOW. Investigating we found that a disk (0b.00.13) had a lot of media error until the threshold, so ONTAP (9.3) started the copy/recovery to spare disk 0b.00.17. Investigating more in deep we found that no disk are marked as broken and, probably, only a partition (data) of disk 13 was "failed and copied to disk 17: cluster3::> storage disk show -broken There are no entries matching your query. cluster3::> storage aggregate show-spare-disks Original Owner: cluster3-node01 Pool0 Root-Data Partitioned Spares Local Local Data Root Physical Disk Type Class RPM Checksum Usable Usable Size Status ---------------- ------ ----------- ------ -------------- -------- -------- -------- -------- 1.0.17 FSAS capacity 7200 block 0B 61.58GB 3.64TB zeroed cluster3::> storage disk show Usable Disk Container Container Disk Size Shelf Bay Type Type Name Owner ---------------- ---------- ----- --- ------- ----------- --------- -------- Info: This cluster has partitioned disks. To get a complete list of spare disk capacity use "storage aggregate show-spare-disks". This cluster has storage pools. To view available capacity in the storage pools use "storage pool show-available-capacity". 1.0.0 372.4GB 0 0 SSD shared sp_NFS cluster3-node02 1.0.1 372.4GB 0 1 SSD shared sp_NFS cluster3-node01 1.0.2 372.4GB 0 2 SSD shared sp_NFS cluster3-node02 1.0.3 372.4GB 0 3 SSD spare Pool0 cluster3-node01 1.0.4 3.63TB 0 4 FSAS shared - cluster3-node02 1.0.5 3.63TB 0 5 FSAS shared aggr0_01, sata_data_1 cluster3-node01 1.0.6 3.63TB 0 6 FSAS shared aggr0_02, sata_data_2 cluster3-node02 1.0.7 3.63TB 0 7 FSAS shared aggr0_01, sata_data_1 cluster3-node01 1.0.8 3.63TB 0 8 FSAS shared aggr0_02, sata_data_2 cluster3-node02 1.0.9 3.63TB 0 9 FSAS shared aggr0_01, sata_data_1 cluster3-node01 1.0.10 3.63TB 0 10 FSAS shared aggr0_02, sata_data_2 cluster3-node02 1.0.11 3.63TB 0 11 FSAS shared aggr0_01, sata_data_1 cluster3-node01 1.0.12 3.63TB 0 12 FSAS shared aggr0_02, sata_data_2 cluster3-node02 1.0.13 3.63TB 0 13 FSAS shared aggr0_01, sata_data_1 cluster3-node01 1.0.14 3.63TB 0 14 FSAS shared aggr0_02, sata_data_2 cluster3-node02 1.0.15 3.63TB 0 15 FSAS shared aggr0_01, sata_data_1 cluster3-node01 1.0.16 3.63TB 0 16 FSAS shared aggr0_02, sata_data_2 cluster3-node02 1.0.17 3.63TB 0 17 FSAS shared sata_data_1 cluster3-node01 1.0.18 3.63TB 0 18 FSAS shared aggr0_02, sata_data_2 cluster3-node02 1.0.19 3.63TB 0 19 FSAS shared aggr0_01, sata_data_1 cluster3-node01 1.0.20 3.63TB 0 20 FSAS shared aggr0_02, sata_data_2 1.0.21 3.63TB 0 21 FSAS shared aggr0_01, sata_data_1 cluster3-node01 1.0.22 3.63TB 0 22 FSAS shared aggr0_02, sata_data_2 cluster3-node02 1.0.23 3.63TB 0 23 FSAS shared aggr0_01, sata_data_1 cluster3-node01 24 entries were displayed. So, disk 13 is not failed but still in RAID, disk 17 is still spare but only for root partition, and no disk is "broken". But almost every 2 minutes we see in the events: raid.shared.disk.exchange for disk 13. Probably the support will send us a new disk for swap with disk 13, but how to mark failed before to extra from shelf? Thanks Here some logs from Events: 01:20:16 disk.ioMediumError: Medium error on disk 0b.00.13: op 0x28:070b6800:0200 sector 118188507 SCSI:medium error - Unrecovered read error - If the disk is in a RAID group, the subsystem will attempt to reconstruct unreadable data (3 11 0 0) (377) Disk 0b.00.13 Shelf 0 Bay 13 [NETAPP X477_WVRDX04TA07 NA02] S/N [ XXXXXX ] UID [50000C0F:01FEFD04:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] 01:27:56 sas.adapter.debug: adapterName="0a", debug_string="Starting powercycle on device 0b.00.13" 01:28:35 raid.disk.timeout.recovery.read.err: Read error on Disk /sata_data_1/plex0/rg0/0b.00.13P1 Shelf 0 Bay 13 [NETAPP X477_WVRDX04TA07 NA02] S/N [ XXXXXX ] UID [60000C0F:01FEFD04:500A0981:00000001:00000000:00000000:00000000:00000000:00000000:00000000], block #19214411 during aggressive timeout recovery 01:28:35 shm.threshold.allMediaErrors: shm: Disk 0b.00.13 has crossed the combination media error threshold in a 10 minute window. 01:29:21 raid.rg.diskcopy.start: /sata_data_1/plex0/rg0: starting disk copy from 0b.00.13P1 to 0b.00.17P1. Reason: Disk replace was started.. 01:31:00 raid.rg.spares.low: /sata_data_1/plex0/rg0 01:31:00 callhome.spares.low: Call home for SPARES_LOW 01:31:01 monitor.globalStatus.nonCritical: There are not enough spare disks. ====== Events show about every 2 minutes ====== raid.shared.disk.exchange: Received shared disk state exchange Disk 0b.00.13 Shelf 0 Bay 13 [NETAPP X477_WVRDX04TA07 NA02] S/N [ XXXXXX ] UID [50000C0F:01FEFD04:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000], event NONE, state prefailing, substate 0x8000, partner state prefailing, partner substate 0x8000, failure reason testing, sick reason RAID_FAIL, offline reason NONE, online reason NONE, partner dblade ID 81c49d52-fa30-11e4-a69c-951e08312b64, host 1 persistent 0, spare on unfail 0, awaiting done 0, awaiting prefail abort 0, awaiting offline abort 0, pool partitioning 0 raid.shared.disk.exchange: Received shared disk state exchange Disk 0b.00.13 Shelf 0 Bay 13 [NETAPP X477_WVRDX04TA07 NA02] S/N [ XXXXXX] UID [50000C0F:01FEFD04:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000], event PREFAIL_DONE, state prefailing, substate 0x8000, partner state prefailing, partner substate 0x10000, failure reason testing, sick reason RAID_FAIL, offline reason NONE, online reason NONE, partner dblade ID 81c49d52-fa30-11e4-a69c-951e08312b64, host 1 persistent 0, spare on unfail 0, awaiting done 1, awaiting prefail abort 0, awaiting offline abort 0, pool partitioning 0
... View more
I have a NetApp FAS3250 with a DS2246 shelf. The hard drives in the shelf are using Seagate 10K 6GB SAS (Model # X417A-R6). One of the hard drives have failed. I have spare (unused) FAS2650 that have Seagate 10K 12GB SAS (Model # x341A-R6). Can I replace the defective hard drive with a known good one from the FAS2650. The working one is a 12GB SAS and the defective one is 6GB SAS. The firmware versions will probably be different as well.
... View more