Skip to content

Working with RDMA in RedHat/CentOS 7.*

Contents

4.83 avg. rating (96% score) - 6 votes

RedHat and CentOS 7.* have integrated RDMA support. In This post we'll discuss how to manage and work with the inbox RDMA packages in those distributions.

Installing RDMA packages

One can install all the RDMA packages manually one by one and resolve the dependency by himself. However, yum provides us an easy way to install all the needed packages for working with RDMA and resolve the dependencies in other packages automatically.

yum allows installation of multiple packages according to a specific area. Unlike its name may imply, the group "Infiniband Support" has all the relevant packages for RDMA support, i.e. InfiniBand, RoCE and iWARP, and not only InfiniBand. The following command will show which packages are part of the group "Infiniband Support":

[root@localhost]# yum groupinfo "Infiniband Support"
Loaded plugins: fastestmirror, security
Setting up Group Process
Loading mirror speeds from cached hostfile
* base: centos.spd.co.il
* extras: centos.spd.co.il
* updates: centos.spd.co.ilGroup: Infiniband Support
Description: Software designed for supporting clustering and grid connectivity using RDMA-based InfiniBand and iWARP fabrics.
Mandatory Packages:

libibcm
libibverbs
libibverbs-utils
librdmacm
librdmacm-utils
+rdma

Default Packages:

dapl
ibacm
ibutils
+libcxgb3
+libcxgb4
libibmad
libibumad
+libipathverbs
libmlx4
libmlx5
+libmthca
+libnes

Optional Packages:

compat-dapl
infiniband-diags
libibcommon
mstflint
opensm
perftest
qperf
srptools

Conditional Packages:

+glusterfs-rdma

As one can see, there are several classifications for packages in this group: "mandatory", "default" and "optional". In RedHat/CentOS 7.* distributions (at least, for now), by default only the "mandatory" and "default" packages will be installed. The following command line will install the needed packages that are needed to work with RDMA:

[root@localhost]# yum -y groupinstall "Infiniband Support"

The "optional" packages needs to be installed explicitly. The following command line will install them:

[root@localhost]# yum --setopt=group_package_types=optional groupinstall "Infiniband Support"

Uninstalling RDMA packages

Just like we used yum to install the packages group, we'll use it to uninstall those packages, if they aren't needed anymore. The following command line will uninstall the RDMA packages:

[root@localhost]# yum -y groupremove "Infiniband Support"

Starting the RDMA services

Load the RDMA drivers using the following command line:

[root@localhost]# systemctl start rdma.service

If one is using the InfiniBand transport and he doesn't have a managed switch in the subnet, he has to start the Subnet Manager (SM). Doing this in one of the machines in the subnet is enough, this can be done with the following command line:

[root@localhost]# systemctl start opensm.service

If one wishes to start the RDMA service automatically when the operating system is loaded, the following command line will do the trick:

[root@localhost]# systemctl enable rdma

Stopping the RDMA services

If the SM is running, then it must be stopped before unloading the drivers. Stop the SM using the following command line:

[root@localhost]# systemctl stop opensm.service

Unload the RDMA drivers using the following command line:

[root@localhost]# systemctl stop rdma.service

RDMA configuration file(s)

1. The rdma service loads the configuration file: /etc/rdma/rdma.conf. This file controls which modules will be loaded during the service startup and some attributes about the RDMA modules. The following parameters are supported:

Parameter name Description Supported values
IPOIB_LOAD Load IPoIB module yes/no
SRP_LOAD Load SRP initiator module yes/no
SRPT_LOAD Load SRP target module yes/no
ISER_LOAD Load ISER initiator module yes/no
RDS_LOAD Load RDS module yes/no
FIXUP_MTRR_REGS Modify the system mtrr registers yes/no

2. RDMA needs to work with pinned memory, i.e. memory which cannot be swapped out by the kernel. By default, every process that is running as a non-root user is allowed to pin a low amount of memory (64KB). In order to work properly as a non-root user, it is highly recommended to increase the size of memory which can be locked. Edit the file /etc/security/limits.conf and add the following lines:


* soft memlock unlimited
* hard memlock unlimited

This will allow process that is running as any user to pin unlimited amount of memory. Changing this line will become effective for new login sessions.

After login again, executing the following command line will print how much memory (in KB) can be locked:

ulimit -l

(the expected output is: "unlimited").

If one wishes to allow better control on this configuration: e.g. less memory to be pinned, or allow only specific user(s) to pin more memory - please refer to the Linux distribution manual.

More information

Detailed information on how to work with RDMA in RedHat/CentOS 7.* can be found in the following URL:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Networking_Guide/ch-Configure_InfiniBand_and_RDMA_Networks.html

FAQs

I tried to restart the rdma service and got the following error message: "failed to restart rdma.service: operation refused, unit rdma.service may be requested by dependency only." What happened?

Starting RedHat 7, the RDMA stack is no longer restartable. can be found in the following URL:
https://bugzilla.redhat.com/show_bug.cgi?id=965829.

Share Our Posts

Share this post through social bookmarks.

  • Delicious
  • Digg
  • Newsvine
  • RSS
  • StumbleUpon
  • Technorati

Comments

Tell us what do you think.

  1. Hiroyuki Sato says: October 16, 2014

    Hello Dotan.

    Is this correct?

    In my environment (CentOS7)
    There are no rdma file in /etc/init.d directory.
    Alternatively, It is stored as /usr/lib/systemd/system/rdma.service.

    The start up script is /usr/libexec/rdma-init-kernel

    And I can't stop rdma service.

    Does /sbin/service rdma stop work correctly on your environment?

    # /sbin/service rdma stop
    Redirecting to /bin/systemctl stop rdma.service
    Failed to issue method call: Operation refused, unit rdma.service may be requested by dependency only.

    • Dotan Barak says: October 17, 2014

      Hi Hiroyuki Sato.

      First of all, I would like to apologize - I forgot to update the instructions of how to stop/start or enable the rdma service in RH/CentOS 7 (I planned to do it but forgot to actually replace the needed text).

      I've updated the post with the relevant information.

      I think that the reason that you failed to stop the service manually is that the following line exists in the service configuration file /usr/lib/systemd/system/rdma.service:
      RefuseManualStop=yes

      A ticket on this was opened in:
      https://bugzilla.redhat.com/show_bug.cgi?id=965829

      For more information on similar problem you can find in:
      https://bugzilla.redhat.com/show_bug.cgi?id=973697

      (AFAIK, this is an OS configuration issue and not RDMA issue).

      Thanks
      Dotan

  2. Hiroyuki Sato says: October 20, 2014

    Hello Dotan.
    Hello Dotan.

    I'm not sure, why RefuseManualStop=yes is needed.

    Anyway thank you so much.

  3. Justin Clift says: November 12, 2014

    Interesting. I'm only now getting around to trying out IB with EL7, so this was useful. ;)

    As a gotcha, be aware your blog software is changing some of the characters in your command lines.

    eg this doesn't work:

    # yum –setopt=group_package_types=optional groupinstall “Infiniband Support”

    The double dashes for the setopt, and the quotes around the "Infiniband Support" words have been changed to other characters.

    People cutting-n-pasting them into their terminals will get error messages (eg "No such group" and similar).

    Hope that helps. ;)

    • Dotan Barak says: November 12, 2014

      Hi Justin.

      First of all, I'm happy that my post helped you
      :)

      Thanks for the feedback, those changes were automatic conversion of the blogging system that I'm using.
      After searching the net, I found a solution and fixed it.

      Thanks for your feedback!
      Dotan

  4. DjvuLee says: May 8, 2015

    Hi, Dotan.
    If we changed some thing, such as config the PFC on the server, how can we update the info when we can not restart the rdma service. or just change the RefuseManualStop=yes to no? any good suggestion?

    • Dotan Barak says: May 8, 2015

      Hi.

      I have some tips on this, but I must have a disclaimer here:
      I'm not a system administrator or a RedHat expect.

      IMHO you can do one of the above:
      * Change the RefuseManualStop=yes -> no
      * Write a service file that load the relevant low-level drivers and configure PFC before the RDMA service is loaded

      However, I wonder:
      what is the reason that you need to restart the RDMA service?
      if needed, maybe a restart to a specific module can be sufficient...
      (as part of the configuring the needed attributes)

      I hope that this answer helped you.
      Dotan

      • DjvuLee says: May 8, 2015

        Thanks very much! Why I want to restart is that recently I want to enable PFC(priority flow control) on the server, there is a need to restart the rdma service. such as the following article suggest.

        https://community.mellanox.com/docs/DOC-1414

        I have another question.
        I noticed that openibd is used in the CentOS5*, just as you post in http://www.rdmamojo.com/2014/08/30/working-rdma-redhatcentos-5/, but it also can be found in CentOS6, but can not restart. Does this mean `service rdma` replace /etc/init.d/openibd?

      • Dotan Barak says: May 8, 2015

        Hi.

        I assume that you are referring to the module parameters for mlx4_core:
        You can configure the configuration file of mlx4_core and starting the next reboot it should work automatically.

        However, if you are using the low-level drivers that comes with the RedHat distribution, you need to verify that this module supports this functionality.

        In MLNX-OFED/Community OFED: the service file is openibd.
        In native RedHat RDMA stack: the service file is MLNX-OFED.

        Unless you have a good reason to keep the RedHat stack, I would suggest to install MLNX-OFED and this will prevent this problem in the first place ...

        Thanks
        Dotan

      • DjvuLee says: May 9, 2015

        Thanks for your replay!

        So, whether I used the CentOS5.* or CentOS6, or CentOS7. As long as I install the MLNX-OFED, i can used the openibd to start or stop the rdma service, does this right?

        As you still working for mellanox, I want to report a bug, it seem that the MLNX_OFED_LINUX-2.4-1.0.4-rhel7.0-x86_64.iso or the tgz file can not be installed on the CentOS7, it will give the following error:

        Error: The current MLNX_OFED_LINUX is intended for rhel7.0

        but my OS is CentOS7(3.10.0-229.1.2.el7.x86_64), so I do not know why the OFED can not be installed. Is there any methods to solve this?

      • Dotan Barak says: May 10, 2015

        Yes.

        openibd to be used with MLNX-OFED and community OFED.
        Inbox RDMA stack *may* have different service scripts (other than openibd).

        May I be rude and ask you to sent this issue to the support of Mellanox?
        Since I am working at Mellanox, but to prevent unwanted conflicts I really try to separate this blog (which I maintain in my free time and using my private resources) and my day job at Mellanox Technologies, in various roles...

        However, I'll try to check it and see if something catches my eyes on this.

        I'm sorry and I hope that you'll understand it.
        Dotan

      • DjvuLee says: May 11, 2015

        You can answer my question is already so nice, I will send this issue to the support.

        Thanks very much! you help me a lot!

  5. Max says: July 25, 2018

    Hello. I have a problem with ib_write_bw and other rdma tools and with UEK kernel (4.1.12-61.1.24.el7uek.x86_64). It can't create Memory region (Couldn't allocate MR) And I don't know why, because it work with another UEK kernel (4.14.35-1828.el7uek.x86_64)

    Could you please tell me why ib_write_bw ,rping and other can't allocate memory region?

    • Dotan Barak says: August 24, 2018

      Hi.

      I suspect that you are working as a non-root user and there is a limit to the amount of memory pages that can be locked (i.e. pinned).
      Increasing this size should solve the problem.

      Thanks
      Dotan

Add a Comment

This comment will be moderated; answer may be provided within 14 days.

Time limit is exhausted. Please reload CAPTCHA.