AFF
AFF
We are using a EX4550-32F x 2 switches stack with a QSFP+ Juniper modules in PIC 1 on both switches for connecting an A700s Netapp to the Data network. For the sake of the question I will address only one of the connections we currently have to the switches.
We are using the following setup as advised by Netapp:
Node B is currently connected with only e0e to the 2nd EX4550-32F QSFP+ module, with a QSFP+-40G-CU3M cable (recognized by the Juniper switch) but shown on the Netapp as "CISCO-JPC", and configured on et-1/1/0 with access mode. A LIF is configured with an IP on e0e directly.
The current status is that network port show command shows healthy status on e0e for Node B, right duplex, mtu etc.
Same is on the et-1/1/0 interface on Juniper.
But if I run the ifstat e0e command on Node B I see CRC errors accumulating on e0e (I don't see any errors on et-1/1/0 interface on the Juniper).
My question is - what can be the reason for the CRC errors (that are probably the reason for the fact we don't have any connectivity on this LIF that uses this port) and what additional things should we check?
Our vendor states that all the hdw is compatible.
Thank you.
Solved! See The Solution
Yeah, my guy says this is on the switch end. I use those all the time on NetApp too Cisco switches without incident. To other switch manufacturers, not so much. Even Arista, it can be hit or miss depending on switch model and os revision. Using optics levels the paying field. Use vendor supported optics at either end and use a neutral fibre to connect them.
Please show the rest of the ifstat output!
I’ve never had any good luck using twinax between any Netapp and a juniper switch. Pretty much always used optical transceivers on both ends.
you could try:
turn off auto negotiation on both ends
forcing flow control to none on both ends
forcing speed to 40000 on both ends
forcing duplex to full on both ends
then physically remove the cable, zero the stats (ifstat -z -a) and plug the cable back in
if it doesn’t work, get APPROVED/SUPPORTED optics for both ends.
twinax is great but doesn’t always work. It does require some bit of supportability in the code on both ends.
I’ve seen customers get special twinax cables with each end programmed differently (like Cisco at one end and Intel at the other).
Thank for your for you comment! 🙏
We actually already ordered optical transceivers and cables to try that from our vendor because my theory was exactly that the cables we using are no good.
We will try your suggestion as soon as those arrive this week and hopefully it will solve our issue.
Full 'ifstat e0e' for the port in question:
It’s not the the cables are no good it’s that one or both ends (Netapp and/or switch) don’t have sufficient programming to support that cable.
just curios, are you able to send a picture of the cable ends (with markings)? If not, no worries. The optics are the best bet in this case anyway
Here are the cables used (both ends are the same):
The A700s 40GbE onboard ports are known to work well with Cisco 40GbE Twinax, generally. It is most likely an issue on the switch side. You can consider using the switch vendor's supported CR4 cables or as others have suggested... go with optical (SR4).
Yeah, my guy says this is on the switch end. I use those all the time on NetApp too Cisco switches without incident. To other switch manufacturers, not so much. Even Arista, it can be hit or miss depending on switch model and os revision. Using optics levels the paying field. Use vendor supported optics at either end and use a neutral fibre to connect them.
Thank you for your help!
I will post our results when we have them.
A few things to try. Are they currently connected as individual ports or a port-channel?
Are the CRCs on the NetApp side (system node run -node * ifstat e0e. then repeat with e0j) or on the switch side or both? Are you seeing just CRCs or anything populated in the Xon/Xoff/Pause fields? (if those are populated -> somewhere you have flo-control enabled...turn it off!)
First, pull the cables an wipe the ends with a clean/lint-free cloth. I know some places take the caps of the ends and they get dirty. Not saying that is the case, but clean the ends. Try hitting all the optics with a puff of air. NOT COMPRESSED AIR. just those little "puffers".
You can clean ONTAP stats by rebooting (clears all stats) or: system node run -node * -ifstat -z -a (zero all interfaces, at least for ifstat)
Try setting BOTH sides to: 40G, full duplex, no-flow control
Some switches do not auto-negotiate nicely. Even Cisco, occaionally I need to force speed/duplex
Strangely, I'm getting this when trying to set the speed of the port to 40GbE:
How is that possible? What am I doing wrong?
Just curious,
Based on that output it looks like the ports are currently in breakout mode!
What’s the output of:
net port show -node * -port e0*
And by the way, if it is in breakout mode , which I believe it is, there’s a good chance the twinax will work after the port is converted back to 40g
We tried the twinax cables after the conversion of the ports to 40GbE and saw CRC errors + no ping
Here is the output, all ports are 40GbE
I did ran the nicadmin convert command on both e0a and e0f nics (which change the mode for the whole pare of the nics e0a + e0e and e0f + e0j) to 40GbE when we go the device. Running was successful
Followed similar procedure:
https://kb.netapp.com/on-prem/ontap/OHW/OHW-KBs/Convert_Ethernet_port_between_40GbE_and_4x10GbE_personality
Ok. Try it in a two step process.
First, just run the port modify command but only set the auto negotiation-admin to false. After that hopefully works then try setting the rest.
I’ve never seen not being able to set the port speed.
This looks like it might be your issue. Just found it
there are a few suggestions there.
basically, try direct optics on the switch side and try to change the debounce timer in the switch
X65402 (332-00389) is also a 40GBASE-eSR4 QSFP transceiver and is supported on AFF A700s onboard ports.
Maybe it's time to start suspecting the EX4550-EM-2QSFP module?
Thank you, we will try to replace the modules and you are correct about the X65402. The funny thing that some vendors, as FINISAR we use are marked as QSFP and SR not QSSFP+ and SR4. It's just an observation and probably a big, red herring