Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Community,
since we have updated some of our systems (FAS82xx, AFF A300) to Ontap 9.3P7,
we see the following Errors in Messages for our (UTA2) 10GBit LAN-Ports, which have not been here before the update.
10/16/2018 11:04:14 <Filer>-<node> ALERT vifmgr.cluscheck.crcerrors: Port e0g on node <Filer>-<Node> is reporting a high number of observed hardware errors, possibly CRC errors.
ifstat shows TotalErrors (increasing) and Errors/Minute but no CRC-Errors
-- interface e0g (1 hour, 42 minutes, 32 seconds) --
RECEIVE
Total frames: 56798k | Frames/second: 9233 | Total bytes: 178g
Bytes/second: 28949k | Total errors: 1337 | Errors/minute: 13
Total discards: 2 | Discards/minute: 0 | Multi/broadcast: 31503
Non-primary u/c: 0 | CRC errors: 0 | Runt frames: 18
Fragment: 0 | Long frames: 1319 | Alignment errors: 0
No buffer: 2 | Pause: 0 | Jumbo: 0
Noproto: 105 | Bus overruns: 0 | LRO segments: 50798k
LRO bytes: 174g | LRO6 segments: 0 | LRO6 bytes: 0
Bad UDP cksum: 0 | Bad UDP6 cksum: 0 | Bad TCP cksum: 0
Bad TCP6 cksum: 0 | Mcast v6 solicit: 0
TRANSMIT
Total frames: 16298k | Frames/second: 2649 | Total bytes: 11749m
Bytes/second: 1909k | Total errors: 0 | Errors/minute: 0
Multi/broadcast: 605 | Pause: 0 | Jumbo: 6655k
Cfg Up to Downs: 0 | TSO non-TCP drop: 0 | Split hdr drop: 0
Timeout: 0 | TSO segments: 840k | TSO bytes: 9910m
TSO6 segments: 0 | TSO6 bytes: 0 | HW UDP cksums: 0
HW UDP6 cksums: 0 | HW TCP cksums: 0 | HW TCP6 cksums: 0
Mcast v6 solicit: 0
DEVICE
Mcast addresses: 4 | Rx MBuf Sz: 4096
LINK INFO
Speed: 10000m | Duplex: full | Flowcontrol: none
Media state: active | Up to downs: 2
From my feeling it looks like a BUG in Data Ontap 9.3P7 (Error in their Portstats, ...), as we dont find any matching Errors in our Network infrastructure. Also no impact seen to the systems.
I already opened a support Case, but uptonow they cannot match this to an existing BUG, as 9.3P7 should have fixed all issues regarding this problem.
So timeconsuming debugging on customer site must be done to find the root-cause 😞
So the Question to the community: Anybody seen this Errors on Ontap 9.3P7?
Best Regards,
Klaus
Solved! See The Solution
1 ACCEPTED SOLUTION
tahmad has accepted the solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
short update on this topic:
it seems like the issues are really related to the Case, that we receive Packets with MTU-Size >1500, while the Port is set to MTU1500.
Starting with Ontap 9.3 this issue gets reported als "long frames" an in the events and alerts.
Our solution for a permanent fix is,
to set the MTU to 9000 on LAN-Ports on the Filer which are connected to a Switch with Jumbo Frames enabled.
Thanks Gidi for your feedback which helped much to solve the issue.
Best Regards,
Klaus
28 REPLIES 28
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HI,
I have the same issue on 9.3p2 on FAS2554
- CRC events each hours
- Error only on received on ifstat but not CRC ..
Have you solved this issues ?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
Your increase is on RECEIVE - "Long frames" - that translates to: "Number of received frames that were greater than the maximum size and had a valid CRC."
That suggest you either have large MTU set somewhere on the LAN but not on the broadcast domain this interface assigned to in ONTAP.
Another option i may think of is if your switch configured to trunk with a native vlan (and that's works ok, and it's a common configuration), but it also has VLANS allowed on - that are not configured in ONTAP .So when the switch also send some vlan tagged traffic that adds up to your frame size. (regardless if the traffic even meant to the filer or was just a broadcast).
if you want to see the traffic in your own eyes, you can collect a pktt.
https://kb.netapp.com/app/answers/answer_view/a_id/1029847
And sort by frame size or use a filter like : "frame.len > 1514 && ip.dst == 10.1.1.1" (1514 is the normal MTU frame size. and 10.1.1.1 is the filer IP so you see only the received frames)
Gidi
Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK
tahmad has accepted the solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
short update on this topic:
it seems like the issues are really related to the Case, that we receive Packets with MTU-Size >1500, while the Port is set to MTU1500.
Starting with Ontap 9.3 this issue gets reported als "long frames" an in the events and alerts.
Our solution for a permanent fix is,
to set the MTU to 9000 on LAN-Ports on the Filer which are connected to a Switch with Jumbo Frames enabled.
Thanks Gidi for your feedback which helped much to solve the issue.
Best Regards,
Klaus
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Klaus,
maybe our issue is quite different, because we are using jumbo frame on all the components (switch interfaces, filer ports, interface group, broadcast domain)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HI Lorenzo,
each Case may be different, as ontap 9.3 is now more chatty in Case of unexpected packages.
Opening a support ticket is a good idea anyway, we also did, but did not finished with support as debugging was not straight forward :-(.
If you want to share your error message and ifstat-counter perhaps somebody from the community could help you.
Best Regards,
Klaus
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have the same issue with a brand new FAS8200/AFF 300 MetroCluster.
We got the following message every hour from every node (from every ethernet port):
vifmgr.cluscheck.crcerrors: Port e0h on node xxxxx is reporting a high number of observed hardware errors, possibly CRC errors.
vifmgr.cluscheck.crcerrors
36519
This message occurs when a network device reports a high number of observed hardware errors, such as CRC errors, length errors, alignment errors, or dropped frames.
The errors could be originating from the specified port, a remote port, or a port on another component of the network. Check the statistics for both the port and the switch. Contact NetApp technical support for assistance and specific instructions.
ONTAP:
NetApp Release 9.4P1
Does anybody have a solution?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
we observer the same errors on our 10Gbit/s ports with MTU size 1500 (Netapp FAS2620), and we have open (for months now) a case to try identifying the cause of those Warning Messages, but no success with Netapp and the Network tcpdump we send them.
Netapp FAS are connected to HPE Switchs, ports on switch side are Jumbo frame enabled, we try to modify the frame size to the min allowed on HPE Switch (1536), but messages persists (at almost fixed time every day with a 10 min delta day after day).
Did you have open a case to Netapp Support? Have they confirm a KB or Known issue related to those messages?
Thanks in advance
Yann
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to say this is a false positive. What is your case number?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Finaly it's not a false positive, we identify the frames by setting our MTU to 9000 on Netapp ports and take another tcpdump.
Frames are broadcasted from our Vmware servers who send them once a day, they are used for network check by vmware.
for the moment we've try to avoid receiving those frames on our Netapp port by disabling jumbo frame on switch ports where our Netapp is connected to, but with no success.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We just upgraded last week from 9.2P4 to 9.3P10 and are now experiencing the same issue, on the same port actually (e0h). It's only one node out of our four node cluster. We are also getting alerts on e0b indiciating "Excessive link errors on network interface e0b. Might indicate a bad cable, switch port, or NIC, or that a cable connector is not fully inserted in a socket. On a 10/100 port, might indicate a duplex mismatch."
I believe both are false positives as we were getting 0 alerts on either interface before the upgrade. Will be opening a case today.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Post some ifstat output.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello
on your vcenter deactivate on dvswitch health check for vlan and mtu ,and you shall not see anymore crc complain on netapp side
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Thanks for your update, i will try on october during our next Disaster test.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
we're experiencing the same with Ontap 9.5P3 and accessing volumes from 2 different hypervisor:
VMware ESXi, 6.7.0, 10302608
Proxmox VE 5.3-11
I think we will engage support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
i just upgraded to Ontap 9.5P4 and got the same issus:
: vifmgr.cluscheck.crcerrors: Port e0b on node is reporting a high number of observed hardware errors, possibly CRC errors.
opened a support case and will update.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @miller ,
thanks for your message. We already have a support case opened, in our environment the presence of CRC errors seems the cause of link flapping on multimode_lacp interface groups.
net.ifgrp.lacp.link.inactive: ifgrp a0a, port e0c has transitioned to an inactive state. The interface group is in a degraded state.
This happens also when almost there's no traffic (our 9.5P3 systems are not yet in production) and not happens with high traffic systems with 9.1 , where the number of CRC is higher...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @miller
since I am working on the support case for @LORENZO_CONTI would you please be able to share your support case with me?
A PM will be fine as well!
Thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
while the support case is opened, can I ask the people that is experiencing the same issue which switch model and os version are you using?
Ours are:
Brocade VDX6740
Network Operating System Version: 7.3.0
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
We had same issue our customer and they changed jumbo frames on switch and netapp after that this issue was fixed.
(FAS9000 , Ontap9.5P5)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @buraksenturk , could you please explain better? What do you mean with "changed jumbo frames" exactly?
Thank you