Updated January 29th with new Priority Flow Control recomendations to add Cluster HeartBeat to Priority ID 7 for Windows Server and Dell Switches.

Updated May 26th 2018 with HPE FlexFabric config

You have probably heard these acronyms somewhere, so what are these and are they the same. In short yes and no

RoCE stands for RDMA over Converged Ethernet, the RDMA part is Remote Direct Memory Access.

RDMA allows for network data(TCP packets) to be offloaded on the Network cards and put directly in to the memory, bypassing the hosts CPU. Allowing for the host to have all the access to the CPU. In normal TCP offload all the network traffic goes trough the CPU and with higher speeds will take more CPU. On a 10gbit network it would take about 100% cpu on a 12 core Intel Xeon V4 CPU.

Mellanox has a good explanation for RDMA here.

DCB stands for Data Center Bridging

What it contains are enhancements to Ethernet communication protocol. Ethernet is a best-effort network that may experience packet loss when network devices are busy, creating re transmission. DCB allows for selected traffic to have zero packet loss. It eliminates loss due to queue overflow and to be able to allocate bandwidth on links. DCB allows for different priorities of packets being sent over the network.

 

In this post i will cover how to enable RDMA and DCB in Windows for SMB and on different switches. I will update with more switches as i read trough different vendors configuration. As the setup varies a lot from vendor to vendor.

In the last year Microsoft has started to recomend iWARP as the default RDMA solution for S2D. This is based on that iWARP do not need DCB, PFC and ETS for it to work. In general RoCE does not need it, but as RoCE communicates over UDP flow controll is needed if there are packet drops.

RoCE is comming with a DCB free solution in the future. But for any High IOPS RDMA configuration today, DCB and PFC is needed. Even for iWARP. To configure DCB/PFC for iWARP it’s identical to RoCE, so the same configuration apply to both.

Switches and Vendors that is covered in this post

Lenovo

NE2572 (CNOS)

Dell

N4000 series
Force 10 S4810p, S6000, S6000-on(FTOS)

Cisco

Nexus NX-OS

Mellanox

SN2100

HPE

FlexFabric 5700/5900

Quanta

LB8

 

How to configure Windows Server 2012, 2012R2, 2016 and 2019 with RDMA and DCB

For SMB you will need to install WindowsFeauture Data-Center-Bridging

Install-WindowsFeature -Name Data-Center-Bridging

Reboot the server and let’s configure the DCB settings. SMB always use Priority 3, you can use any other, but best practice is 3. And Cluster HeartBeat uses Priority 7

Create QoS Policy
New-NetQosPolicy "SMB" -NetDirectPortMatchCondition 445 -PriorityValue8021Action 3
New-NetQosPolicy "Cluster" -PriorityValue8021Action 7

# Turn on Flow Control for SMB and Cluster
Enable-NetQosFlowControl -Priority 3,7

# Make sure flow control is off for other traffic
Disable-NetQosFlowControl -Priority 0,1,2,4,5,6

#Disable DCBx
Set-NetQosDcbxSetting -Willing $false -Confirm:$false

# Apply a Quality of Service (QoS) policy to the target adapters
Enable-NetAdapterQos -InterfaceAlias "NIC1","NIC"

# Give SMB Direct a minimum bandwidth of 50%
New-NetQosTrafficClass "SMB" -Priority 3 -BandwidthPercentage 50 -Algorithm ETS

#Give Cluser a minimum bandwith of 1%
New-NetQosTrafficClass "Cluster" -Priority 7 -BandwidthPercentage 1 -Algorithm ETS

#Disable Flow Controll on physical Nics
Set-NetAdapterAdvancedProperty -Name "NIC1" -RegistryKeyword "*FlowControl" -RegistryValue 0
Set-NetAdapterAdvancedProperty -Name "NIC2" -RegistryKeyword "*FlowControl" -RegistryValue 0

#Enable QoS and RDMA on nic's
Get-NetAdapterQos -Name "NIC1","NIC2" | Enable-NetAdapterQos
Get-NetAdapterRDMA -Name "NIC1","NIC2" | Enable-NetAdapterRDMA

After the QOS part is done, let’s configure a network team or a switch. For S2D one uses a setswitch with Embededteaming

New-VMSwitch –Name S2DSwitch –NetAdapterName "NIC1","NIC2" -EnableEmbeddedTeaming $true -AllowManagementOS $false

Let’s create some network cards and enable RDMA on them. Once RDMA is enabled DCB will also be enabled for SMB.

Add-VMNetworkAdapter –SwitchName S2DSwitch –Name Managment –ManagementOS
Add-VMNetworkAdapter –SwitchName S2DSwitch –Name SMB1 –ManagementOS
Add-VMNetworkAdapter –SwitchName S2DSwitch –Name SMB2 –ManagementOS

# Enable RDMA on the virtual network adapters just created
$smbNICs = Get-NetAdapter  -Name *SMB* | Sort-Object
$smbNICs | Enable-NetAdapterRDMA #Let's find the physical nics in the team. $physicaladapters = (Get-VMSwitch | Where-Object { $_.SwitchType -Eq "External" }).NetAdapterInterfaceDescriptions | ForEach-Object { Get-NetAdapter -InterfaceDescription $_ | Where-Object { $_.Status -ne "Disconnected" } } #Map SMB interfaces to Physical Nics Set-VMNetworkAdapterTeamMapping -VMNetworkAdapterName $smbNICs[0].Name -ManagementOS -PhysicalNetAdapterName (get-netadapter -InterfaceDescription $physicaladapters[0].InterfaceDescription).name Set-VMetworkAdapterTeamMapping -VMNetworkAdapterName $smbNICs[0].Name -ManagmentOS -PhysicalNetAdapterName (get-netadapter -nterfaceDescription $physicaladapters[0].InterfaceDescription).name

To check if RDMA is enabled you can run this command

Get-SmbClientNetworkInterface | where RdmaCapable -EQ $true | ft FriendlyName

Now DCB and RDMA is configured in Windows, let’s move to the switch setup.

 

This is where the hard part is, figuring out the correct setup for your switch. Most switch vendors support this.

Lenovo NE2572

Use default port settings, and enable DCB on switch in global mode.

Cee Enable 

cee ets priority-group pgid 3 priority 3
cee ets priority-group pgid 3 description "RoCEv2"
cee pfc priority 3 enable
cee pfc priority 3 description "RoCEv2"

cee ets priority-group pgid 7 priority 7
cee ets priority-group pgid 7 description "Cluster"
cee pfc priority 7 enable
cee pfc priority 7 description "Cluster"

cee ets priority-group pgid 0 description "Default"
cee ets priority-group pgid 0 priority 4 5 6

cee ets bandwith-percentage 0 49 3 50 7 1 

DEll N4000 Series

Turn off flowcontrol on all interfaces.

Conf t

interface range tengigabitethernet 1/0/13,ten1/0/14,ten1/0/15,ten1/0/16,ten2/0/13,ten2/0/14,ten2/0/15,ten2/0/16

classofservice traffic-class-group 0 1
classofservice traffic-class-group 1 1
classofservice traffic-class-group 2 1
classofservice traffic-class-group 3 0
classofservice traffic-class-group 4 1
classofservice traffic-class-group 5 1
classofservice traffic-class-group 6 1
classofservice traffic-class-group 7 2
traffic-class-group max-bandwidth 49 50 1
traffic-class-group min-bandwidth 49 50 1
traffic-class-group weight 49 50 1

datacenter-bridging
priority-flow-control mode on
priority-flow-control priority 3 no-drop
priority-flow-control priority 7 no-drop
exit
exit

What you set here is that we have sett traffic class 3 into group 0, and we have set max and min bandwith on the groups. The groups are 0,1,2. This gives max bandwith for group 0 and 1 50% each.  Then we enable the DCB config on the interfaces with mode on. And with priority 3 no-drop we enable the no packet drop on the traffic class 3.

Dell Force 10 S4810p

Turn off flowcontrol on all interfaces.

dcb enable

dcb-map SMBDIRECT
 priority-group 0 bandwidth 50 pfc on
 priority-group 1 bandwidth 49 pfc off
 priority-group 2 bandwidth 1 pfc on
 priority-pgid 1 1 1 0 1 1 1 2
exit

interface TenGigabitEthernet 1/46
 description
 no ip address
 mtu 12000
 switchport
 spanning-tree pvst edge-port
 dcb-map SMBDIRECT
 no shutdown
exit

Dell Force 10 S6000, S6000-On(FTOS)

Turn off flowcontrol on all interfaces.

conf t

protocol lldp
advertise management-tlv system-capabilities system-description system-name
advertise interface-port-desc

dcb enable

dcb-map RDMA-dcb-map-profile
 priority-group 0 bandwidth 50 pfc on
 priority-group 1 bandwidth 50 pfc off
 priority-group 2 bandwidth 1 pfc on
 priority-pgid 1 1 1 0 1 1 1 2
exit

interface fortyGigE 1/5
description 
no ip address
mtu 9216
portmode hybrid
switchport
dcb-map RDMA-dcb-map-profile
no shutdown
exit

Cisco Nexus NX-OS

By default PFC(Priority Flow Control) is enabled on Cisco Nexus switches. To hard enable it do the following.

No Priority 7 for cluster

configure terminal 
interface ethernet 5/5 
priority-flow-control mode on 

switch(config)# class-map type qos c1
switch(config-cmap-qos)# match cos 3
switch(config-cmap-qos)# exit

switch(config)# policy-map type qos p1
switch(config-pmap-qos)# class type qos c1
switch(config-pmap-c-qos)# set qos-group 3
switch(config-pmap-c-qos)# exit
switch(config-pmap-qos)# exit

switch(config)# class-map type network-qos match-any c1
switch(config-cmap-nqos)# match qos-group 3
switch(config-cmap-nqos)# exit

switch(config)# policy-map type network-qos p1
switch(config-pmap-nqos)# class type network-qos c-nq1
switch(config-pmap-nqos-c)# pause buffer-size 20000 pause-threshold 100 resume-threshold 1000 pfc-cos 3
switch(config-pmap-nqos-c)# exit
switch(config-pmap-nqos)# exit
switch(config)# system qos
switch(config-sys-qos)# service-policy type network-qos p1
exit

Cisco Nexus 3132  NX-OS 6.0(2)U6(1)

By default PFC(Priority Flow Control) is enabled on Cisco Nexus switches. To hard enable it do the following.

No Priority 7 for cluster

#Global settings

class-map type qos match-all RDMA
match cos 3
class-map type queuing RDMA
match qos-group 3
policy-map type qos QOS_MARKING
class RDMA
set qos-group 3
class class-default
policy-map type queuing QOS_QUEUEING
class type queuing RDMA
bandwidth percent 50
class type queuing class-default
bandwidth percent 50
class-map type network-qos RDMA
match qos-group 3
policy-map type network-qos QOS_NETWORK
class type network-qos RDMA
mtu 2240
pause no-drop
class type network-qos class-default
mtu 9216
system qos
service-policy type qos input QOS_MARKING
service-policy type queuing output QOS_QUEUEING
service-policy type network-qos QOS_NETWORK

#Port Specific settings
switchport mode trunk
#Set your vlans on next lines
switchport trunk native vlan 99
switchport trunk allowed vlan 99,2000,2050
spanning-tree port type edge
flowcontrol receive off
flowcontrol send off
no shutdown
priority-flow-control mode on

 

Mellanox SN2100

No Priority 7 for cluster

configure terminal
priority-flow-control priority 
dcb priority-flow-control 3 enable

interface ethernet 1/1
dcb priority-flow-control mode on

dcb ets tc bandwidth 10 50 40 0

 

HPE FlexFabric 5700/5900 series

No Priority 7 for cluster

#Setting the ETS priority 3 to group 1
qos map-table dot1p-lp
 import 0 export 0  
 import 1 export 0  
 import 2 export 0  
 import 3 export 1 
 import 4 export 0  
 import 5 export 0  
 import 6 export 0  
 import 7 export 0 
 exit

#ETS configuration for 50% dropless on group 1 priority 3 wich is default for SMB RDMA
interface ten-gigabitethernet 1/0/1 
 qos trust dot1p
 qos wrr be group 1 byte-count 15  
 qos wrr af1 group 1 byte-count 15  
 qos wrr af2 group sp  q
 os wrr af3 group sp  
 qos wrr af4 group sp  
 qos wrr af group sp  
 qos wrr ca6 group sp  
 qos wrr ca7 group sp

#Turning on PFC on the interfaces
interface ten-gigabitethernet 1/0/1  
 priority-flow-control auto
 priority-flow-control no-drop dot1p 3
 qos trust dotlp

#For these next lines you don't realy need unless you are realy pushing your config and maxing out speeds.

#RoCEv1 QCN congestion config
qcn enable 
qcn priority 3 auto  
Exit
interface Ten-GigabitEthernet1/0/10  
 lldp tlv-enable dotl-tlv congestion-notification

#RoCEv2 ECN congestion config
qos wred queue table ROCEv2  
 queue 0 drop-level 0 low-limit 1000 high-limit 18000 discard-probability 25  
 queue 0 drop-level 1 low-limit 1000 high-limit 18000 discard-probability 50  
 queue 0 drop-level 2 low-limit 1000 high-limit 18000 discard-probability 75  
 queue 1 drop-level 0 low-limit 18000 high-limit 37000 discard-probability 1  
 queue 1 drop-level 1 low-limit 18000 high-limit 37000 discard-probability 5  
 queue 1 drop-level 2 low-limit 18000 high-limit 37000 discard-probability 10  
 queue 1 ecn 
exit
interface Ten-GibabitEthernet1/0/10  
 qos wred apply ROCEv2

Quanta

This is the basic how to enable, not had the chance to test this out my self yet. So this will be updated as the manual is not straight forward. 

No Priority 7 for cluster

#To make sure DCB is enabled we can run this command
priority-flow-control mode ON/Auto (Default is Auto and it is enabled)

#Now we need to set priority no-drop for priority 3. Standard is no-drop on 3,4,5,6
#First we clear all priority
no priority-flow-control priority

#Then we set on only priority 3
priority-flow-control priority 3 no-drop


#Now let's set ets queue bandwith
#to enable
queue ets

#To set bandwith between san/lan to 50/50 run
no queue ets weight

#let's set the san bandwith to priority 3
queue ets pg-mapping lan 0 1 2 4 5 6 7
queue ets pg-mapping san 3

#let's configure pfc for interface
interface 1/1
storm-control flowcontrol pfc