You might be able to cut the number of them, you will find different opinions about acceptable number of dropped packets.
How can you tell if you are experiencing dropped packets? Packet trace on the client side, you should limit the amount of data you capture to make easier to read, then use WireShark to read the packet trace. NetApp does not record packet loss except for NIC's with Chelsio chipset. While you are in WireShark, go into expert mode, click on Analyze then exprtinfo. Some other things to look things to look put in the filer box:
tcp.analysis.ack_lost_segment
tcp.analysis.retransmission
rpc.time > x (x is in seconds so .5 would be half a second, a long time
If you have a controller with an NIC that has the Chelsio chipset:
ifinfo -a | egrep "(bad headers|interface|Driver)" | grep -B2 "bad headers”
We run this on about 120 NetApp controllers that have the Chelsio chipset, the numbers varied a lot. The worst case, we looked at had a total of 31B packets, the packet loss was one out of every 1500 packets. The best case was one out of 3B packets.
Another way to tell if you are getting a fair amount of dropped packets look at the number of Oracle log writer errors. When we replaced the NIC on the NetApp controller with one that had a much bigger buffer the number of Oracle log writer errors dropped drastically,
One option to help cut down the number of dropped packets, set net.ip4.tcp_sack=1 (selective acknowledgement) on Linux machines. This will tell whatever system your machine is sending packets to, just retransmit one packet of the chain instead of all of them. This helped cut the number of Oracle log writer errors.
The X1107 (Chelsio) NIC has a 160K RX buffer, X1117 (Intel) has 512K buffer and the X1139/40 has 64K buffer size. I believe the X1117 card is the latest card. Before you consider replacing the card make sure you are running the correct version of OnTap that is required.
Part #
|
Description
|
FAS Platform
|
Data ONTAP
|
FCoE
|
Bus
|
Supplier
|
Transmit Buffer
|
Notes
|
X1005A-R5
|
NIC 1-Port Optical 10GbE PCI-X
|
FAS3050, FAS60xx
|
- 7.2.3, 7.3.x, 8.x
|
No
|
PCI-X
|
Chelsio
|
|
X1008A-R5
|
NIC 2-Port Optical 10GbE PCIe
|
FAS3040, FAS3070, FAS31xx, FAS32xx, FAS60xx, SA300/600, V-Series
|
- 7.2.3, 7.3.x, 8.x
|
No
|
PCIe Gen1, 8 lanes
|
Chelsio
|
|
X1106A-R6
|
NIC 1-Port Optical 10Gbe PCIe
|
FAS2050 only
|
- 7.3.2
|
No
|
PCIe Gen1, 8 lanes
|
Chelsio
|
|
X1107A-R6
|
NIC 2-Port Bare Cage SFP+ 10GbE PCIe
|
FAS3040, FAS3070, FAS31xx, FAS32xx, FAS60xx, SA300/600, V-Series
|
- 7.3.2, 8.x
|
No
|
PCIe Gen1, 8 lanes
|
Chelsio
|
160K
|
Same throughput as X1139
(Gen 1 8 Lanes = Gen 2 4 Lanes)
|
X1117-R6
|
NICII 2-Port Bare Cage SFP+ 10GbE PCIe
|
FAS32xx, FAS62xx
|
- 8.0.4+
|
No
|
PCIe Gen2, 8 lanes
|
Intel
|
256K/512K
|
Cannot use on AP1/AP2
|
X1139A/40-R6
|
ADPT 2-Port Unified Target 10GbE SFP+ PCIe
|
FAS3040, FAS3070, FAS31xx, FAS32xx, FAS60xx, SA300/600, V-Series
|
- 7.3.2, 8.x
|
Yes*
|
PCIe Gen2, 4 lanes
|
QLogic
|
16K/64K
|
Gen 1 CNA Card
|
N/A
|
On-board 10GbE ports
|
FAS2240, FAS32xx, FAS62xx
|
- 8.x
|
No
|
PCIe Gen2, 8 lanes
|
Intel
|
We struggled with this for a while, hopefully this will someone that is seeing this problem.