ONTAP Hardware

CRC-Error Messages seen since Ontap Upgrade to 9.3P7

klmi

Dear Community,

 

since we have updated some of our systems (FAS82xx, AFF A300) to Ontap 9.3P7,

we see the following Errors in Messages for our (UTA2) 10GBit LAN-Ports, which have not been here before the update.

10/16/2018 11:04:14 <Filer>-<node>   ALERT         vifmgr.cluscheck.crcerrors: Port e0g on node <Filer>-<Node> is reporting a high number of observed hardware errors, possibly CRC errors.

 

ifstat shows TotalErrors (increasing) and Errors/Minute but no CRC-Errors

-- interface  e0g  (1 hour, 42 minutes, 32 seconds) --

RECEIVE
 Total frames:    56798k | Frames/second:    9233  | Total bytes:       178g
 Bytes/second:    28949k | Total errors:     1337  | Errors/minute:      13
 Total discards:      2  | Discards/minute:     0  | Multi/broadcast: 31503
 Non-primary u/c:     0  | CRC errors:          0  | Runt frames:        18
 Fragment:            0  | Long frames:      1319  | Alignment errors:    0
 No buffer:           2  | Pause:               0  | Jumbo:               0
 Noproto:           105  | Bus overruns:        0  | LRO segments:    50798k
 LRO bytes:         174g | LRO6 segments:       0  | LRO6 bytes:          0
 Bad UDP cksum:       0  | Bad UDP6 cksum:      0  | Bad TCP cksum:       0
 Bad TCP6 cksum:      0  | Mcast v6 solicit:    0
TRANSMIT
 Total frames:    16298k | Frames/second:    2649  | Total bytes:     11749m
 Bytes/second:     1909k | Total errors:        0  | Errors/minute:       0
 Multi/broadcast:   605  | Pause:               0  | Jumbo:            6655k
 Cfg Up to Downs:     0  | TSO non-TCP drop:    0  | Split hdr drop:      0
 Timeout:             0  | TSO segments:      840k | TSO bytes:        9910m
 TSO6 segments:       0  | TSO6 bytes:          0  | HW UDP cksums:       0
 HW UDP6 cksums:      0  | HW TCP cksums:       0  | HW TCP6 cksums:      0
 Mcast v6 solicit:    0
DEVICE
 Mcast addresses:     4  | Rx MBuf Sz:       4096
LINK INFO
 Speed:           10000m | Duplex:            full | Flowcontrol:       none
 Media state:     active | Up to downs:          2

 

From my feeling it looks like a BUG in Data Ontap 9.3P7 (Error in their Portstats, ...), as we dont find any matching Errors in our Network infrastructure. Also no impact seen to the systems.

I already opened a support Case, but uptonow they cannot match this to an existing BUG, as 9.3P7 should have fixed all issues regarding this problem.

 

So timeconsuming debugging on customer site must be done to find the root-cause 😞 

 

So the Question to the community: Anybody  seen this Errors on Ontap 9.3P7?

 

Best Regards,

Klaus

1 ACCEPTED SOLUTION

klmi

Hello,

 

short update on this topic:

it seems like the issues are really related to the Case, that we receive Packets with MTU-Size >1500,  while the Port is set to MTU1500.

Starting with Ontap 9.3 this issue gets reported als "long frames" an in the events and alerts.

 

Our solution for a permanent fix is,

to set the MTU to 9000 on LAN-Ports on the Filer which are connected to a Switch with Jumbo Frames enabled.

 

Thanks Gidi for your feedback which helped much to solve the issue.

 

Best Regards,

Klaus

View solution in original post

28 REPLIES 28

buraksenturk

Hi All,

 

We had same issue our customer and they changed jumbo frames on switch and netapp after that this issue was fixed. 

 

(FAS9000 , Ontap9.5P5)

Hi @buraksenturk , could you please explain better? What do you mean with "changed jumbo frames" exactly?
Thank you

Hi @LORENZO_CONTI 

 

port set to switches 9000 and also port set netapp 9000.

 

thats example from customer;

 

switch;

  TX

    1916130391 unicast packets  2666708 multicast packets  20843127 broadcast packets

    1939643557 output packets  136406204523 bytes

    3331 jumbo packets

    3331 output error  0 collision  0 deferred  0 late collision

    0 lost carrier  0 no carrier  0 babble  0 output discard

    0 Tx pause

 

netapp;

 

RECEIVE

Total frames:      551m | Frames/second:     197  | Total bytes:     36893m

Bytes/second:    13145  | Total errors:     3372  | Errors/minute:       0

 Total discards:      0  | Discards/minute:     0  | Multi/broadcast: 23737k

Non-primary u/c:     0  | CRC errors:       3370  | Runt frames:         2

 Long frames:         0  | Alignment errors:    0  | No buffer:           0

 Pause:               0  | Jumbo:              42  | Noproto:             0

 Bus overruns:        0  | LRO segments:      100  | LRO bytes:       11072

 LRO6 segments:       0  | LRO6 bytes:          0  | Bad UDP cksum:       0

 Bad UDP6 cksum:      0  | Bad TCP cksum:       0  | Bad TCP6 cksum:      0

 Mcast v6 solicit:    0

PKoza

We have same Error since 5 Month, NetApp does not find a solution.
The port we have problems with, is e0M (Management).
e0M can not be higher than MTU 1500, Switch is configured too on MTU 1500 and with 1000 Mbps.


PeterPie

I'm getting the same errors with long frames and length errors with our e0M ports and we upgraded to our ONTAP 9.7P10
Did you find any solution to this issue.  

paul_stejskal

Did you follow the action plan in this KB? https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/Ifstat_output_reports_long_frames

 

Your port is getting larger than configured MTU frames, and ONTAP is correctly discarding because it is invalid and possibly misconfigured.

PeterPie

I went over the ifcommands with support , also our networking team, confirm that the ports for the mgt ports e0M are set with MTU of 1500 on and also on the switches 

network port show below that it is healthy ,but still getting those errors.  We even replaced the network cables on all 4 E0m ports
Speed(Mbps) Health
Port IPspace Broadcast Domain Link MTU Admin/Oper Status
--------- ------------ ---------------- ---- ---- ----------- --------
e0M Default mgmt up 1500 auto/1000 healthy
e0M Default mgmt up 1500 auto/1000 healthy
e0M Default mgmt up 1500 auto/1000 healthy
e0M Default mgmt up 1500 auto/1000 healthy

 

Still trying thought

paul_stejskal

PKoza can you post your ifstat output for that port?

 

PKoza

Here it is attached.

LORENZO_CONTI

Hello,
while the support case is opened, can I ask the people that is experiencing the same issue which switch model and os version are you using?

Ours are:
Brocade VDX6740
Network Operating System Version: 7.3.0

Thank you

LORENZO_CONTI

Hello,
we're experiencing the same with Ontap 9.5P3 and accessing volumes from 2 different hypervisor:
VMware ESXi, 6.7.0, 10302608
Proxmox VE 5.3-11

I think we will engage support

miller

Hi,

 

i just upgraded to Ontap 9.5P4 and got the same issus:

: vifmgr.cluscheck.crcerrors: Port e0b on node is reporting a high number of observed hardware errors, possibly CRC errors.

 

opened a support case and will update.

 

maffo

Hello @miller 

since I am working on the support case for @LORENZO_CONTI would you please be able to share your support case with me?

A PM will be fine as well!

 

Thanks in advance.

LORENZO_CONTI

Hello @miller ,

thanks for your message. We already have a support case opened, in our environment the presence of CRC errors seems the cause of link flapping on multimode_lacp interface groups.

net.ifgrp.lacp.link.inactive: ifgrp a0a, port e0c has transitioned to an inactive state. The interface group is in a degraded state.

This happens also when almost there's no traffic (our 9.5P3 systems are not yet in production) and not happens with high traffic systems with 9.1 , where the number of CRC is higher... 

FBoettger

We have the same issue with a brand new FAS8200/AFF 300 MetroCluster.

We got the following message every hour from every node (from every ethernet port):

 

vifmgr.cluscheck.crcerrors: Port e0h on node xxxxx is reporting a high number of observed hardware errors, possibly CRC errors.
 
vifmgr.cluscheck.crcerrors
 
36519
 
This message occurs when a network device reports a high number of observed hardware errors, such as CRC errors, length errors, alignment errors, or dropped frames.
 
The errors could be originating from the specified port, a remote port, or a port on another component of the network. Check the statistics for both the port and the switch. Contact NetApp technical support for assistance and specific instructions.
 

 

 

ONTAP:

NetApp Release 9.4P1

 

Does anybody have a solution?

lafoucrier

Hello,

 

we  observer the same errors on our 10Gbit/s ports with MTU size 1500 (Netapp FAS2620), and we have open (for months now) a case to try identifying the cause of those Warning Messages, but no success with Netapp and the Network tcpdump we send them.

 

Netapp FAS are connected to HPE Switchs, ports on switch side are Jumbo frame enabled, we try to modify the frame size to the min allowed on HPE Switch (1536), but messages persists (at almost fixed time every day with a 10 min delta day after day).

 

Did you have open a case to Netapp Support? Have they confirm a KB or Known issue related to those messages? 

 

Thanks in advance

 

Yann

 

paul_stejskal

I want to say this is a false positive. What is your case number?

lafoucrier

Finaly it's not a false positive, we identify the frames by setting our MTU to 9000 on Netapp ports and take another tcpdump.

 

Frames are broadcasted from our Vmware servers who send them once a day, they are used for network check by vmware.

 

for the moment we've try to avoid receiving those frames on our Netapp port by disabling jumbo frame on switch ports where our Netapp is connected to, but with no success.

Raviere

Hello

on your vcenter deactivate on dvswitch health check for vlan and mtu ,and you shall not see anymore crc complain on netapp side

Announcements
NetApp on Discord Image

We're on Discord, are you?

Live Chat, Watch Parties, and More!

Explore Banner

Meet Explore, NetApp’s digital sales platform

Engage digitally throughout the sales process, from product discovery to configuration, and handle all your post-purchase needs.

NetApp Insights to Action
I2A Banner
Public