Subscribe

Dropped packets and buffer size of NICs

[ Edited ]

You might be able to cut the number of them, you will find different opinions about acceptable number of dropped packets.

 

How can you tell if you are experiencing dropped packets? Packet trace on the client side, you should limit the amount of data you capture to make easier to read, then use WireShark to read the packet trace. NetApp does not record packet loss except for NIC's with Chelsio chipset. While you are in WireShark, go into expert mode, click on Analyze then exprtinfo. Some other things to look things to look put in the filer box:

     tcp.analysis.ack_lost_segment

     tcp.analysis.retransmission

     rpc.time > x (x is in seconds so .5 would be half a second, a long time

 

If you have a controller with an NIC that has the Chelsio chipset:

ifinfo -a | egrep "(bad headers|interface|Driver)" | grep -B2 "bad headers”

We run this on about 120 NetApp controllers that have the Chelsio chipset, the numbers varied a lot. The worst case, we looked at had a total of 31B packets, the packet loss was one out of every 1500 packets. The best case was one out of 3B packets.

 

Another way to tell if you are getting a fair amount of dropped packets look at the number of Oracle log writer errors. When we replaced the NIC on the NetApp controller with one that had a much bigger buffer the number of Oracle log writer errors dropped drastically,

 

One option to help cut down the number of dropped packets, set net.ip4.tcp_sack=1 (selective acknowledgement) on Linux machines. This will tell whatever system your machine is sending packets to, just retransmit one packet of the chain instead of all of them. This helped cut the number of Oracle log writer errors.

 

The X1107 (Chelsio) NIC has a 160K RX buffer, X1117 (Intel) has 512K buffer and the X1139/40 has 64K buffer size. I believe the X1117 card is the latest card. Before you consider replacing the card make sure you are running the correct version of OnTap that is required.

 

Part #

Description

FAS Platform

Data ONTAP

FCoE

Bus

Supplier

Transmit Buffer

Notes

X1005A-R5

NIC 1-Port Optical 10GbE PCI-X

FAS3050, FAS60xx

  1. 7.2.3, 7.3.x, 8.x

No

PCI-X

Chelsio

 

X1008A-R5

NIC 2-Port Optical 10GbE PCIe

FAS3040, FAS3070, FAS31xx, FAS32xx, FAS60xx, SA300/600, V-Series

  1. 7.2.3, 7.3.x, 8.x

No

PCIe Gen1, 8 lanes

Chelsio

 

X1106A-R6

NIC 1-Port Optical 10Gbe PCIe

FAS2050 only

  1. 7.3.2

No

PCIe Gen1, 8 lanes

Chelsio

 

X1107A-R6

NIC 2-Port Bare Cage SFP+ 10GbE PCIe

FAS3040, FAS3070, FAS31xx, FAS32xx, FAS60xx, SA300/600, V-Series

  1. 7.3.2, 8.x

No

PCIe Gen1, 8 lanes

Chelsio

160K

Same throughput as X1139

(Gen 1 8 Lanes = Gen 2 4 Lanes)

X1117-R6

NICII 2-Port Bare Cage SFP+ 10GbE PCIe

FAS32xx, FAS62xx

  1. 8.0.4+

No

PCIe Gen2, 8 lanes

Intel

256K/512K

Cannot use on AP1/AP2

X1139A/40-R6

ADPT 2-Port Unified Target 10GbE SFP+ PCIe

FAS3040, FAS3070, FAS31xx, FAS32xx, FAS60xx, SA300/600, V-Series

  1. 7.3.2, 8.x

Yes*

PCIe Gen2, 4 lanes

QLogic

16K/64K

Gen 1 CNA Card

N/A

On-board 10GbE ports

FAS2240, FAS32xx, FAS62xx

  1. 8.x

No

PCIe Gen2, 8 lanes

Intel

 

We struggled with this for a while, hopefully this will someone that is seeing this problem.