Skip to content

Working with RDMA using OFED

Contents

5.00 avg. rating (98% score) - 2 votes

OFED (OpenFabrics Enterprise Distribution) is a package that developed and released by the OpenFabrics Alliance (OFA), as a joint effort of many companies that are part of the RDMA scene. It contains the latest upstream software packages (both kernel modules and userspace code) to work with RDMA. It supports InfiniBand, Ethernet and RoCE transports.

This package supports most major Linux distributions and CPU architectures.

Installing RDMA packages

Prerequisites

Make sure that there is at least 1 GB free space in your disk.

Download the package

Open the following URL in a web browser:
https://www.openfabrics.org/downloads/

Click the "OFED" directory and then click on the directory of the OFED version that you wish to download. Download the .tgz file of the release from that directory (it is highly advised to download the final released package and not one of  the RCs).

Installation

Pre-install

After downloading the tgz file, extract it and enter the directory that the files exist in:

[root@localhost]# mkdir /tmp/OFED
[root@localhost]# cd /tmp/OFED
[root@localhost]# tar xzf OFED-3.12.tgz
[root@localhost]# cd OFED-3.12

Post-install

Delete the generated directory that holds the extracted files:

[root@localhost]# cd
[root@localhost]# rm -fr /tmp/OFED

Now, everything is ready and installation of OFED can be started.

Installation

The installation script will:

  • Verify that installed kernel is supported.
  • Uninstall all installed RDMA packages, which are part of the Linux distribution or part of any RDMA package.
  • Build the OFED binary RPMs from the SRPMs.
  • Install the binary RPMs.
  • Fix local configuration files, if needed.

The same installation procedure is used in all Linux distributions.

The following command line will start the installation process.

[root@localhost]# ./install.pl

Now, a textual menu will be displayed:

OFED Distribution Software Installation Menu

1) View OFED Installation Guide
2) Install OFED Software
3) Show Installed Software
4) Configure IPoIB
5) Uninstall OFED Software
Q) Exit

Select Option [1-5]:

Choose "2" to go to the installation section.

OFED Distribution Software Installation Menu

1) Basic (OFED modules and basic user level libraries)
2) HPC (OFED modules and libraries, MPI and diagnostic tools)
3) All packages (all of Basic, HPC)
4) Customize
Q) Exit

Select Option [1-4]:

There are 4 options to decide which packages will be built (i.e. binary RPMs will be built from the SRPMs) and installed:

  1. Install Basic packages which include kernel modules and the basic userspace libraries and tools: low-level drivers libraries and tools for all RDMA devices, libibverbs and librdmacm.
  2. Install HPC packages which include all the Basic package + DAPL, subnet diagnostics tools, performance benchmarks and MPI packages.
  3. Install all the available packages.
  4. Customize the packages to be installed.

One should decide which option is the best for him.

Building all the binaries RPMs may take several minutes.

In the following example screenshot, I choose "1" (Basic).

Below is the list of OFED packages that you have chosen
(some may have been added by the installer due to package dependencies):ofed-scripts
libibverbs
libibverbs-utils
libibverbs-devel
libmthca
libmlx4
libmlx5
libcxgb3
libcxgb4
libnes
librdmacm
librdmacm-utils
libocrdma
ofed-docs
compat-rdmaBuild ofed-scripts RPM
Running  rpmbuild --rebuild  --define '_topdir /var/tmp//OFED_topdir' --define 'dist %{nil}' --target i586 --define '_prefix /usr' --define '_exec_prefix /usr' --define '_sysconfdir /etc' --define '_usr /usr' /tmp/OFED/OFED-3.12/SRPMS/ofed-scripts-3.12-0.1.g751417d.src.rpm
Install ofed-scripts RPM:
Running rpm -iv  /tmp/OFED/OFED-3.12/RPMS/sles-release-11.3-1.138/i686/ofed-scripts-3.12-0.1.g751417d.i586.rpm
Build libibverbs RPM
Running  rpmbuild --rebuild  --define '_topdir /var/tmp//OFED_topdir' --define 'dist %{nil}' --target i586 --define '_prefix /usr' --define '_exec_prefix /usr' --define '_sysconfdir /etc' --define '_usr /usr' /tmp/OFED/OFED-3.12/SRPMS/libibverbs-1.1.7-1.src.rpm
Install libibverbs RPM:
Running rpm -iv  /tmp/OFED/OFED-3.12/RPMS/sles-release-11.3-1.138/i686/libibverbs-1.1.7-1.i586.rpm
Install libibverbs-utils RPM:
Running rpm -iv  /tmp/OFED/OFED-3.12/RPMS/sles-release-11.3-1.138/i686/libibverbs-utils-1.1.7-1.i586.rpm
Install libibverbs-devel RPM:
Running rpm -iv  /tmp/OFED/OFED-3.12/RPMS/sles-release-11.3-1.138/i686/libibverbs-devel-1.1.7-1.i586.rpm
Build libmthca RPM
Running  rpmbuild --rebuild  --define '_topdir /var/tmp//OFED_topdir' --define 'dist %{nil}' --target i586 --define '_prefix /usr' --define '_exec_prefix /usr' --define '_sysconfdir /etc' --define '_usr /usr' /tmp/OFED/OFED-3.12/SRPMS/libmthca-1.0.6-1.src.rpm
Install libmthca RPM:
Running rpm -iv  /tmp/OFED/OFED-3.12/RPMS/sles-release-11.3-1.138/i686/libmthca-1.0.6-1.i586.rpm
Build libmlx4 RPM
Running  rpmbuild --rebuild  --define '_topdir /var/tmp//OFED_topdir' --define 'dist %{nil}' --target i586 --define '_prefix /usr' --define '_exec_prefix /usr' --define '_sysconfdir /etc' --define '_usr /usr' --nodeps /tmp/OFED/OFED-3.12/SRPMS/libmlx4-1.0.5-1.src.rpm
Install libmlx4 RPM:
Running rpm -iv  --nodeps /tmp/OFED/OFED-3.12/RPMS/sles-release-11.3-1.138/i686/libmlx4-1.0.5-1.i586.rpm
Build libmlx5 RPM
Running  rpmbuild --rebuild  --define '_topdir /var/tmp//OFED_topdir' --define 'dist %{nil}' --target i586 --define '_prefix /usr' --define '_exec_prefix /usr' --define '_sysconfdir /etc' --define '_usr /usr' /tmp/OFED/OFED-3.12/SRPMS/libmlx5-1.0.1-1.src.rpm
Install libmlx5 RPM:
Running rpm -iv  /tmp/OFED/OFED-3.12/RPMS/sles-release-11.3-1.138/i686/libmlx5-1.0.1-1.i586.rpm
Build libcxgb3 RPM
Running  rpmbuild --rebuild  --define '_topdir /var/tmp//OFED_topdir' --define 'dist %{nil}' --target i586 --define '_prefix /usr' --define '_exec_prefix /usr' --define '_sysconfdir /etc' --define '_usr /usr' /tmp/OFED/OFED-3.12/SRPMS/libcxgb3-1.3.1-1.src.rpm
Install libcxgb3 RPM:
Running rpm -iv  /tmp/OFED/OFED-3.12/RPMS/sles-release-11.3-1.138/i686/libcxgb3-1.3.1-1.i586.rpm
Build libcxgb4 RPM
Running  rpmbuild --rebuild  --define '_topdir /var/tmp//OFED_topdir' --define 'dist %{nil}' --target i586 --define '_prefix /usr' --define '_exec_prefix /usr' --define '_sysconfdir /etc' --define '_usr /usr' /tmp/OFED/OFED-3.12/SRPMS/libcxgb4-1.3.3-1.src.rpm
Install libcxgb4 RPM:
Running rpm -iv  /tmp/OFED/OFED-3.12/RPMS/sles-release-11.3-1.138/i686/libcxgb4-1.3.3-1.i586.rpm
Build libnes RPM
Running  rpmbuild --rebuild  --define '_topdir /var/tmp//OFED_topdir' --define 'dist %{nil}' --target i586 --define '_prefix /usr' --define '_exec_prefix /usr' --define '_sysconfdir /etc' --define '_usr /usr' /tmp/OFED/OFED-3.12/SRPMS/libnes-1.1.4-0..src.rpm
Install libnes RPM:
Running rpm -iv  /tmp/OFED/OFED-3.12/RPMS/sles-release-11.3-1.138/i686/libnes-1.1.4-0..i586.rpm
Build librdmacm RPM
Running  rpmbuild --rebuild  --define '_topdir /var/tmp//OFED_topdir' --define 'dist %{nil}' --target i586 --define '_prefix /usr' --define '_exec_prefix /usr' --define '_sysconfdir /etc' --define '_usr /usr' /tmp/OFED/OFED-3.12/SRPMS/librdmacm-1.0.18.1-1.src.rpm
Install librdmacm RPM:
Running rpm -iv  /tmp/OFED/OFED-3.12/RPMS/sles-release-11.3-1.138/i686/librdmacm-1.0.18.1-1.i586.rpm
Install librdmacm-utils RPM:
Running rpm -iv  /tmp/OFED/OFED-3.12/RPMS/sles-release-11.3-1.138/i686/librdmacm-utils-1.0.18.1-1.i586.rpm
Build mstflint RPM
Running  rpmbuild --rebuild  --define '_topdir /var/tmp//OFED_topdir' --define 'dist %{nil}' --target i586 --define '_prefix /usr' --define '_exec_prefix /usr' --define '_sysconfdir /etc' --define '_usr /usr' /tmp/OFED/OFED-3.12/SRPMS/mstflint-3.6.0-1.8.g7d4dede.src.rpm
Install mstflint RPM:
Running rpm -iv  /tmp/OFED/OFED-3.12/RPMS/sles-release-11.3-1.138/i686/mstflint-3.6.0-1.8.g7d4dede.i586.rpm
Build libocrdma RPM
Running  rpmbuild --rebuild  --define '_topdir /var/tmp//OFED_topdir' --define 'dist %{nil}' --target i586 --define '_prefix /usr' --define '_exec_prefix /usr' --define '_sysconfdir /etc' --define '_usr /usr' /tmp/OFED/OFED-3.12/SRPMS/libocrdma-1.0.2-1.src.rpm
Install libocrdma RPM:
Running rpm -iv  /tmp/OFED/OFED-3.12/RPMS/sles-release-11.3-1.138/i686/libocrdma-1.0.2-1.i586.rpm
Build ofed-docs RPM
Running  rpmbuild --rebuild  --define '_topdir /var/tmp//OFED_topdir' --define 'dist %{nil}' --target i586 --define '_prefix /usr' --define '_exec_prefix /usr' --define '_sysconfdir /etc' --define '_usr /usr' /tmp/OFED/OFED-3.12/SRPMS/ofed-docs-3.12-0.1.ge933a19.src.rpm
Install ofed-docs RPM:
Running rpm -iv  /tmp/OFED/OFED-3.12/RPMS/sles-release-11.3-1.138/i686/ofed-docs-3.12-0.1.ge933a19.i586.rpm
Build compat-rdma RPM
Running rpmbuild --rebuild  --define '_topdir /var/tmp//OFED_topdir' --define '_suse_os_install_post %{nil}' --nodeps --define '_dist .sles11sp3' --define 'configure_options   --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-mthca-mod --with-mlx4-mod --with-mlx4_en-mod --with-mlx5-mod --with-cxgb3-mod --with-cxgb4-mod --with-nes-mod --with-ocrdma-mod --with-ipoib-mod' --define 'build_kernel_ib 1' --define 'build_kernel_ib_devel 1' --define 'KVERSION 3.0.76-0.11-pae' --define 'K_SRC /lib/modules/3.0.76-0.11-pae/build' --define '_release 0.1.gac916cb.3.0.76_0.11_pae' --define 'network_dir /etc/sysconfig/network' --define '_prefix /usr' --define '__arch_install_post %{nil}' /tmp/OFED/OFED-3.12/SRPMS/compat-rdma-3.12-0.1.gac916cb.src.rpm
Install compat-rdma RPM:
Running rpm -iv  --nodeps /tmp/OFED/OFED-3.12/RPMS/sles-release-11.3-1.138/i686/compat-rdma-3.12-0.1.gac916cb.3.0.76_0.11_pae.i586.rpmThe default IPoIB interface configuration is based on DHCP.
Note that a special patch for DHCP is required for supporting IPoIB.
The patch is available under docs/dhcp
If you do not have DHCP, you must change this configuration in the following steps.
IPoIB interfaces configured successfully
Press any key to continue ...Installation finished successfully.
Press any key to continue...

Different versions of OFED or different Linux distributions may have different RPM names and versions.

Reboot is usually required to complete the installation.

Notes

Failure in uninstalling automatically an RPM

If there is a problem to uninstall any RPM, the installer will print an error message and the user will need to remove this RPM manually.

Missing RPMs during the binary RPMs build

Before building the binary RPMs, the installer will verify that all required RPMs are installed. If any RPM (according to the Linux distribution and the list of packages to install) is missing, an error message which specifies the name of the missing RPM and the package that needs it will be displayed. For example:

zlib-devel rpm is required to build mstflint
libstdc++43-devel rpm is required to build mstflint
gcc-c++ rpm is required to build mstflint

One should install the missing RPMs and restart the installation.

Deployment of OFED in a cluster

A configuration file with the name ofed.conf will be saved locally. It will describe the chosen options. If one wishes to install OFED automatically with the same options that were taken, executing the installer with -c ofed.conf will do the trick.

The built binary RPMs are saved in the installation directory, under the RPMS directory.

When installing the OFED release in a cluster, it is a good practice installing it on a single machine manually and after its completion, take the whole installation directory (with the configuration file and binary RPMs) and install it in the rest of the machines automatically.

Uninstalling RDMA packages

This package comes with an uninstall script that will clean everything. The following command line will uninstall the OFED packages:

/usr/sbin/ofed_uninstall.shThis program will uninstall all OFED packages on your machine.Do you want to continue?[y/N]:

This "yes/no" question should be answered with "y". Only pressing the "enter" key will use the default (N) and the uninstallation will be terminated.

Running /usr/sbin/vendor_pre_uninstall.sh

Removing OFED Software installations

Running /bin/rpm -e --allmatches compat-rdma libibverbs libibverbs-devel libibverbs-utils libmthca libmlx4 libmlx5 libcxgb3 libcxgb4 libnes libibumad libibumad-devel librdmacm librdmacm-utils librdmacm-devel libocrdma perftest dapl dapl-devel dapl-devel-static dapl-utils qperf ofed-docs ofed-scripts
warning: /etc/infiniband/openib.conf saved as /etc/infiniband/openib.conf.rpmsave
Running /tmp/24985-ofed_vendor_post_uninstall.sh

Uninstall can also be done from the installation menu (Option "5" in the first menu) or using the uninstall.sh, which provided in the OFED package.

Starting the RDMA services

Load the RDMA drivers using the following command line:

[root@localhost]# /etc/init.d/openibd start

If one is using the InfiniBand transport and he doesn't have a managed switch in the subnet, he has to start the Subnet Manager (SM). Doing this in one of the machines in the subnet is enough, this can be done with the following command line:

[root@localhost]# /etc/init.d/opensmd start

By default, when installing the RDMA packages, the RDMA service will be loaded automatically when the operating system is loaded. If this doesn't happen, one should fix manually.

Stopping the RDMA services

If the SM is running, then it must be stopped before unloading the drivers. Stop the SM using the following command line:

[root@localhost]# /etc/init.d/opensmd stop

Unload the RDMA drivers using the following command line:

[root@localhost]# /etc/init.d/openibd stop

RDMA configuration file(s)

1. The openibd service loads the configuration file: /etc/infiniband/openib.conf. This file controls which modules will be loaded during the service startup and some attributes about the RDMA modules. The following parameters are supported:

Parameter name Description Supported values
ONBOOT Start RDMA modules/drivers upon machine boot yes/no
NODE_DESC Node description prefix command Command string. For example: $(hostname -s)
NODE_DESC_TIME_BEFORE_UPDATE Time, in seconds, to wait before updating node description Non-negative integer numbers
RUN_SYSCTL Run sysctl performance tuning on network attributes script yes/no
RDMA_CM_LOAD Load RDMA_CM module yes/no
RDMA_UCM_LOAD Load RDMA_UCM module yes/no
UCM_LOAD Load UCM module yes/no
RENICE_IB_MAD Increase ib_mad thread priority yes/no
IPOIB_LOAD Load IPoIB module yes/no
SET_IPOIB_CM Enable IPoIB Connected Mode yes/no
MLX5_LOAD Load MLX5 driver yes/no
MLX4_LOAD Load MLX4 driver yes/no
MLX4_EN_LOAD Load MLX4_EN driver yes/no
MTHCA_LOAD Load MTHCA driver yes/no
CXGB4_LOAD Load CXGB4 driver yes/no
CXGB3_LOAD Load CXGB3 driver yes/no
NES_LOAD Load iw_nes driver yes/no
OCRDMA_LOAD Load OCRDMA driver yes/no

2. RDMA needs to work with pinned memory, i.e. memory which cannot be swapped out by the kernel. By default, every process that is running as a non-root user is allowed to pin a low amount of memory (64KB). In order to work properly as a non-root user, it is highly recommended to increase the size of memory which can be locked. Edit the file /etc/security/limits.conf and add the following lines, if they weren't added by the installation script:

* soft memlock unlimited
* hard memlock unlimited

This will allow process that is running as any user to pin unlimited amount of memory. Changing this line will become effective for new login sessions.

After login again, executing the following command line will print how much memory (in KB) can be locked:

ulimit -l

(the expected output is: "unlimited").

  1. If one wishes to allow better control on this configuration: e.g. less memory to be pinned, or allow only specific user(s) to pin more memory - please refer to the Linux distribution manual.

Share Our Posts

Share this post through social bookmarks.

  • Delicious
  • Digg
  • Newsvine
  • RSS
  • StumbleUpon
  • Technorati

Comments

Tell us what do you think.

  1. Igor R. says: December 21, 2014

    Hi,

    What could be a reason of "device's health compromised" kernel message? I see it appears here: http://lxr.free-electrons.com/source/drivers/net/ethernet/mellanox/mlx5/core/health.c
    but can't realise when it might be triggered.

    Thanks!

    • Igor R. says: December 21, 2014

      just a followup: the error string is "ICM fetch PCI error".

      • Dotan Barak says: December 23, 2014

        I would suggest to update the driver and firmware.
        And if this didn't solve the problem - contact Mellanox support.

        Thanks
        Dotan

Add a Comment

This comment will be moderated; answer may be provided within 14 days.

Time limit is exhausted. Please reload CAPTCHA.