State of RDMA: There can only be one

Simply put, RDMA allows a device to see use the network pipe fully.

This is done in a few ways;

  • the first trick is RDMA uses is to allow the HBA to directly access memory, in this way to processor pin an address range and can give that address range
    to transfer to the remote side and let the card retrieve the memory directly. This has the effect of lowering CPU utilization greatly for large trasnfers. Once the transfer is complete the memory can be unpinned.
  • The second trick would be to let the card use custom silicon to handel the entire stack, and optimize that stack to these RDMA transfers. There are three layers of offload; TCPIP/Ethernet, Custom Headers over Ethernet, and Custom Headers over Infinniband.
  • The third trick would be to use use low latency and high bandwidth switches, which in their own right increase any traffic flow.


This concept and these methods are not new, consider the following quote;

"RDMA implements a Transport Protocol in the NIC hardware and supports Zero-Copy Networking, which makes it possible to read data directly
from the main memory of one computer and write that data directly to the main memory of another computer. RDMA has proven useful in apps..."

-This quote was written in 2005 (, and repeated every tech session since as if its a revelation.


There are 3 main varients of RDMA, these are;
  • iWARP, is the IETF RDMA Standard. Switch can be any high speed ethernet switch. This basically offloads the TCP stack directly onto the card and runs across a Ethernet Network.
  • RoCE, requires Converged Enhanced switches (CE) also called Data Center Bridging (DCB) switches so lossless/pause/priority flow control (PFC) are supported. This protocol runs a simpler stack that is NOT TCP based, but still runs on a Ethernet network, much like FCoE does.
  • Infiniband, required custom host adapters as well as custom switches. 

And the problem is that the miss-information prevails due to RDMA vendors that sell hardware have a vested interest in their own bet, and take shots at eachother publically;

One of the few voices out there that we would think is independant would be Microsoft, since they support all three, but
consider that Microsoft has a few different positions here as well. The developer side of Microsoft wants to support all three with the least common
denominator of features, in which case they write to iWarp. The Production and Operations side of the house however wants some of those advanced features like lossless and PFC, while the group that builds impressive demos and marketing material love to tout the maximum performance without regard to actual datacenter needs.

Lets let you see some realistic numbers. These are averages from CDW and NewEgg

Type Protocol Speeds Switch Requirements Routable Cost (NIC + Switch) per port
RDMA iWARP 10g No Requirements, but will work on DCB Yes $900 (+$275 )
RDMA RoCE 10g-->40g DCB No $1000 (+$500)
RDMA Infiniband 12g-->54g Infiniband No $1500 (+$300)
  10g E 10g No Requirements, but will work on DCB Yes $400 (+$275)
  FCoE 10g DCB Yes $500 (+$500)
  FC 8g-->16g Fibre Channel Yes $800-->$1600 (+$500)

You probably also have a few requirements when it comes to your servers, the first of which is reliability. You likely require that you have multiple connections
from the Server to your production network, multiple connections to your storage, and multiple connections to your peer servers.

  • Sample Demo Rig (high performance, no cost constrants, no consideration for simplicity/reliability/scalability)
    • 2 x 10g for Production network, 2 x 54g IB for peer, and 2 x 16g FC for storage but this is a lot of wires and networks to manage.
  • Highly scalable/deployable setup (good performance, flexibility if needs change, less to manage, software defined QoS to support needs)
    • 2 x 10g FCoE with QOS for production network and Storage access, 1 x 10g RoCE for peer servers and since they can all live on DCB
      type switches there is only one set of networks to manage. I would only need a single RoCE adapter since SMB would shed traffic to the non-RDMA adapater if a failure happened on the RDMA network.

You may find the difference in performance between the two above sample designs can favor the simpler design by simply using the next faster step of processor, or slightly more system memory. People generally underestimate what a well designed 10g Ethernet on DCB switching can do.

Do you want to deploy Converged Ethernet switching knowing that it will support both iWARP or RoCE, or purchase non-CE ethernet switches as well as Infiniband switches to support. Are you currently depoying iSCSI using software based initiators, you may find a significant CPU reduction by moving to a FCoE type connection since all of that work is then moved off to the FCoE card.


I really want to hear your experiences with RDMA, have you had good/neutral/bad expereiences with them in a production enviornment.


I guess I should point out that I expect eventually that the RoCE standard will win, as DCB style switches are becoming much more common. I would say that I expect Infiniband to fall off in popularity just like Fibre Channel switches have in favor of FCoE switches, but Infiniband switches have never been that popular to start with. Infiniband is currently bargin basement pricing, but sells to a niche crowd, while 10g E sells to everyone and their grandmother.