Working with IPoIB
- 1 IPoIB kernel module control
- 2 IPoIB network interface control
- 2.1 Show available IPoIB network interfaces
- 2.2 Configuring IP address of IPoIB interface
- 2.3 Controlling IPoIB mode
- 2.4 Configuring the MTU of an IPoIB network interface
- 2.5 Partitioning in IPoIB (VLAN equivalent)
- 2.6 Verifying that IPoIB is working
In this post I will explain how to work with IPoIB. This includes, loading/unloading the IPoIB module, configure it and verify that it is working.
IPoIB kernel module control
Verifying that the IPoIB module is loaded
The IPoIB kernel module name in Linux is 'ib_ipoib'. The following command line verifies that it is loaded:
The above output is an example of a machine in which the IPoIB is loaded. One should pay attention to the 'ib_ipoib' at the beginning of the first line. If this model isn't loaded, one needs to load it.
Note: In different operating systems or in different configuration this output may look different (i.e. different kernel module may be loaded).
Loading the IPoIB module
One can load the IPoIB module as part of the RDMA service on his OS/system. In the configuration file of the RDMA stack the IPoIB should be set to 'load'.
The following command line will load the IPoIB module manually:
Unloading the IPoIB module
One can unload the IPoIB module as part of the RDMA service in his OS/system.
The following command line will unload the IPoIB module manually:
IPoIB network interface control
Show available IPoIB network interfaces
When the IPoIB kernel module is loaded, for every port of every local InfiniBand device a network interface will be created. The following command shows the network interfaces that are available in the machine:
The IPoIB network interfaces are the ones that have the prefix 'ib'.
As one can see, 'ifconfig' prints a warning that it has troubles to print the MAC address, since it has many (i.e. 20) bytes. Fortunately, the 'ip' command can help us and show the MAC address of such a network interface. The following command shows the MAC address of the network interface 'ib0':
Configuring IP address of IPoIB interface
Configuring IPoIB interface manually
By default, unless someone configures the IPoIB network interfaces manually or automatically, those interfaces don't have any configured IP address and they are down. One can configure those network interfaces, like any other network interface, using 'ifconfig'. The following command line will configure an IP address and netmask to the IPoIB network interface 'ib0':
This configuration takes effect immediately, without the need to restart the machine or any service. However, it isn't persistent and will disappear on machine reboot.
Now, that the IP address of that network interface is configured, it is ready to work and look like this:
Configuring IPoIB interface using configuration file
IPoIB network interface can be configured using Linux network configuration files, like any other network interface. One should create configuration file for every network interface and fill it with the needed information. Here is an example of such a configuration file - the file '/etc/sysconfig/network-scripts/ifcfg-ib0', which configures the network interface 'ib0':
When this configuration file exists, every time that the networking service will create this network interface it will configure to it those attributes automatically.
Loading this configuration file, after creating/changing it, can be done by rebooting the machine or restarting the networking service, according to the distribution. In many Linux distributions the following command line will restart the networking service:
Controlling IPoIB mode
As explained in the previous post, IPoIB has two working modes:
- Datagram mode
- Connected mode
The working mode of every IPoIB network interface is independent and different interfaces, even on the same physical InfiniBand device, can work with different modes.
Changing an IPoIB interface to work in configuration file
There are some RDMA distributions, such as: MLNX-OFED, which supports configuring the IPoIB working mode using the service configuration file. In most such distributions, setting the parameter 'SET_IPOIB_CM' to 'yes' will configure all available IPoIB network interfaces to Connected mode. Otherwise, they will be loaded in Datagram mode.
Changing an IPoIB interface to work in Datagram mode manually
Changing the working mode of an IPoIB network interface to work in datagram mode can be done by writing the word 'datagram' to a control file in the sysfs of that interface. For example:
Changing IPoIB interface to work in Connected mode manually
Changing the working mode of an IPoIB network interface to work in connected mode can be done by writing the word 'connected' to a control file in the sysfs of that interface. For example:
Checking working mode of an IPoIB interface
Checking the working mode of an IPoIB network interface can be done by printing the content of the file that control the working mode. For example:
Configuring the MTU of an IPoIB network interface
Like any other network interface, the IPoIB network interface can be changed.
The maximum supported value of an IPoIB network interface depends on the working mode.
- For datagram mode, the maximum MTU value depends on the used IPoIB multicast MTU size minus the IPoIB encapsulation header (4 bytes).
- For 2KB IB MTU: the maximum MTU can be 2044 bytes.
- For 4KB IB MTU: the maximum MTU can be 4092 bytes.
- For connected mode, the maximum MTU can be 65520 bytes.
Changing IPoIB interface MTU manually
Configuration of an IPoIB network interface can be like any other network interface using 'ifconfig'. The following command line changes the MTU of a the network interface 'ib0' to 2000 bytes:
Changing IPoIB interface MTU with a configuration file
There are some RDMA distributions, such as: MLNX-OFED, which supports configuring the IPoIB working mode using the service file. In most such distributions, setting the parameter 'IPOIB_MTU' to the size of the MTU when working with connected mode. Otherwise, the interfaces MTU won't be changed and work with the default.
Another option to change the configuration of an IPoIB network interface can be using its system configuration file, which make this configuration persistent during machine reboot. Changing the interface configuration file /etc/sysconfig/network-scripts/ifcfg-[interface name] (different Linux distributions may have different place of this file) and set the line to the MTU sizes in bytes. For example, setting the following line in the configuration file will configure the MTU to be 2000 bytes.
Partitioning in IPoIB (VLAN equivalent)
When the IPoIB driver is loaded, it creates, by default, one interface for each port of the available InfiniBand devices using the P_Key value at index 0 of the P_Key table in that port.
Configuring OpenSM to support partitions
When 'OpenSM' is running on the host, one can change its configuration file to support more partitions. The configuration file /etc/rdma/partitions.conf controls the partitions that will be configured in the subnet by OpenSM. Different versions of OpenSM may have a different default place for the partition file (the man page of the installed OpenSM will show that place). One can use the '-P' parameter to point explicitly to a specific place for this file.
Here is an example of such a configuration file which configures the default P_Key (0xffff) and another P_Key (0x8001) in the fabric:
The valid P_Key values in this configuration file are 0x0001-0x7fff.
Now, that the configuration file was updated, one should restart OpenSM, this can be done using the following command line:
Note: OpenSM supports the ability to configure specific ports with full membership whereas other ports will be configured be with partial membership. Explaining how to do so is out of the scope of this post.
Verifying the configured partitions on local port
The configured partitions for every InfiniBand device's port can be found in the sysfs. For example, the following command line prints the non-zero configured partitions in port 1 of the InfiniBand device 'mlx4_0':
Creating a network interface with a P_Key
To create an interface with a different P_Key, write the desired P_Key value into the main interface's
/sys/class/net/[interface name]/create_child file. For example, the following command will create a child interface using P_Key 0x8001 for the IPoIB network interface 'ib0':
A new network interface, with the name 'ib0.8001' with P_Key value 0x8001 will be created.
Note: IPoIB network interface can be created using a P_Key value, even if that P_Key value isn't configured in that port's P_Key table.
Removing a network interface with a P_Key
To remove a subinterface that was created with a specific P_Key, write this P_Key value into the main interface's
/sys/class/net/[interface name]/delete_child file. For example:
Verifying that IPoIB is working
Since IPoIB provides a fully functional and working network interface, to verify that it is configured properly one can just use 'ping' to a remote IP address of another IPoIB network interface in the subnet and verify that there aren't any dropped packets:
Tell us what do you think.