Oracle recognizes that Linux Remote Direct Access (RDMA) implementations need features such as high availability and load balancing, and expects to pass code to the kernel to do just that.
The problem, as explained by Oracle Linux kernel developer, Sudhakar Dindukurti, in this post, is that performance and security considerations imply that RDMA adapters link the hardware with a "specific port and route".
A standard network interface card, on the other hand, can choose which netdev (network device) to use to send a packet. Failover and load balancing is native.
Dindukurti's work aims to bring that capability to both NIC InfiniBand and RoCE (RDMA over converged Ethernet), and move it from the Unbreakable Enterprise kernel (UEK) of Oracle to the Linux source code.
Its RDMA over IP Resistance (RDMAIP) creates a high availability connection, through active-active linking to create a union group between the ports of an adapter. If a port is lost, the traffic is moved to the other ports in the group. This is done using Oracle's reliable datagram sockets (RDS), which has been in the Linux kernel since 2009.
Expanding this to rugged RDMAIP involves a new process that allows a system to send packets to eliminate nodes, as detailed in the Oracle publication:
"1) The client application registers the memory with the RDMA adapter and the RDMA adapter returns an R_Key for the memory region registered to the client Note that the registration information is saved in the RDMA adapter;
"2) The client sends this" R_key "to the remote server;
"3) The server includes this R_key while requesting RDMA_READ / RDMA_WRITE from the client"; Y
"4) The RDMA adapter on the client side uses the" R key "to find the region of the memory and continue with the transaction, since the" R key "is linked to a particular RDMA adapter, the same R_KEY does not it can be used to send data through another RDMA adapter, and since RDMA applications can communicate directly with the hardware, bypassing the kernel) can not provide HA. "
In a load balancing scenario, all interfaces in the link group have their own IP addresses, and the "consumer", that is, an application or operating system process, decides the best way to choose which interfaces to use.
Failover is easier, since RDMAIP detects a lowering interface. The module moves the IP address of the failed interface to another in the group, and an RDMA Communication Manager (RDMA CM) event notifies the relevant kernel processes to change the addresses they use.
The failback is handled in the same way: the RDMAIP module moves the traffic to the recovered address and sends another CM RDMA message.
To get this Linux ready for the kernel, Dindukurti wrote, the RDMAIP Resilient module needs to be more closely coupled with the implementation of the network stack. That would allow RDMA kernel consumers to create active linking groups, and provide APIs to expose linked groups and their interfaces.