Skip to content

ibv_reg_mr()

3.50 avg. rating (74% score) - 4 votes
struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, void *addr,
                          size_t length, enum ibv_access_flags access);

Description

ibv_reg_mr() registers a Memory Region (MR) associated with a Protection Domain. By doing that, allowing the RDMA device to read and write data to this memory. Performing this registration takes some time, so performing memory registration isn't recommended in the data path, when fast response is required.

Every successful registration will result with a MR which has unique (within a specific RDMA device) lkey and rkey values.

The MR's starting address is addr and its size is length. The maximum size of the block that can be registered is limited to device_attr.max_mr_size. Every memory address in the virtual space of the calling process can be registered, including, but not limited to:

  • Local memory (either variable or array)
  • Global memory (either variable or array)
  • Dynamically allocated memory (using malloc() or mmap())
  • Shared memory
  • Addresses from the text segment

The registered memory buffer doesn't have to be page-aligned.

There isn't any way to know what is the total size of memory that can be registered for a specific device.

The argument access describes the desired memory access attributes by the RDMA device. It is either 0 or the bitwise OR of one or more of the following flags:

IBV_ACCESS_LOCAL_WRITE Enable Local Write Access: Memory Region can be used in Receive Requests or in IBV_WR_ATOMIC_CMP_AND_SWP or
IBV_WR_ATOMIC_FETCH_AND_ADD to write locally the remote content value
IBV_ACCESS_REMOTE_WRITE Enable Remote Write Access: Memory Region can be access from remote context using IBV_WR_RDMA_WRITE or
IBV_WR_RDMA_WRITE_WITH_IMM
IBV_ACCESS_REMOTE_READ Enable Remote Read Access: Memory Region can be access from remote context using IBV_WR_RDMA_READ
IBV_ACCESS_REMOTE_ATOMIC Enable Remote Atomic Operation Access (if supported): Memory Region can be access from remote context usingĀ IBV_WR_ATOMIC_CMP_AND_SWP or
IBV_WR_ATOMIC_FETCH_AND_ADD
IBV_ACCESS_MW_BIND Enable Memory Window Binding

If IBV_ACCESS_REMOTE_WRITE or IBV_ACCESS_REMOTE_ATOMIC is set, then IBV_ACCESS_LOCAL_WRITE must be set too since remote write should be allowed only if local write is allowed.

Local read access is always enabled for the MR, i.e. Memory Region can be read locally usingĀ IBV_WR_SEND, IBV_WR_SEND_WITH_IMM, IBV_WR_RDMA_WRITE, IBV_WR_RDMA_WRITE_WITH_IMM.

The requested permissions of the memory registration can be whole or subset of the operating system permission of that memory block. For example: read only memory cannot be registered with write permissions (either local or remote).

A specific process can register one or more Memory Regions.

Parameters

Name Direction Description
pd in Protection Domain that was returned from ibv_alloc_pd()
addr in The start address of the virtual contiguous memory block
length in Size of the memory block to register, in bytes. This value must be at least 1 and less than dev_cap.max_mr_size
access in Requested access permissions for the memory region

Return Values

Value Description
MR Pointer to the newly allocated Memory Region.
This pointer also contains the following fields:

lkey The value that will be used to refer to this MR using a local access
rkey The value that will be used to refer to this MR using a remote access

Those values may be equal, but this isn't always guaranteed.

NULL On failure, errno indicates the failure reason:

EINVAL Invalid access value
ENOMEM Not enough resources (either in operating system or in RDMA device) to complete this operation

Examples

Register a MR to allow only local read and write access and deregister it:

struct ibv_pd *pd;
struct ibv_mr *mr;
 
mr = ibv_reg_mr(pd, buf, size, IBV_ACCESS_LOCAL_WRITE);
if (!mr) {
	fprintf(stderr, "Error, ibv_reg_mr() failed\n");
	return -1;
}
 
if (ibv_dereg_mr(mr)) {
	fprintf(stderr, "Error, ibv_dereg_mr() failed\n");
	return -1;
}

Register a MR to allow Remote read and write to it:

mr = ibv_reg_mr(pd, buf, size, IBV_ACCESS_LOCAL_WRITE |
		IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_REMOTE_READ);
if (!mr) {
	fprintf(stderr, "Error, ibv_reg_mr() failed\n");
	return -1;
}

FAQs

Why is a MR good for anyway?

MR registration is a process in which the RDMA device takes a memory buffer and prepare it to be used for local and/or remote access.

Can I register the same memory block more than once?

Yes. One can register the same memory block, or part of it, more than once. Those memory registration can even be performed with different access flags.

What is the total size of memory that can be registered?

There isn't any way to know what is the total size of memory that can be registered. Theoretically, there isn't any limit to this value. However, if one wishes to register huge amount of memory (hundreds of GB), maybe default values of the low-level drivers aren't enough; look at the "Device Specific" section to learn how to change the default parameter values in order to solve this issue.

Can I have access a memory using the RDMA device with more permissions than the operating system allows me to?

No. During the memory registration, the driver checks the permissions of the memory block and check that the requested permissions are allowed to be used with that memory block.

Can I use memory block in RDMA without this registration?

Basically no. However, there are RDMA devices that have the ability to read memory without the need of memory registration (inline data send).

ibv_reg_mr() failed, what is the reason for this?

ibv_reg_mr() can fail because of the following reasons:

  • Bad attributes: bad permissions or bad memory buffer
  • Not enough resources to register this memory buffer

If this is the first buffer that you register, maybe this is because of the first reason. If the memory registration fails after many buffers were already registered, maybe the reason for this failure is that there aren't enough resources to register this memory buffer: most of the time, resources in the RDMA device for the translation of virtual address => physical address. In that case, you may want to check how to increase the amount of memory that can be registered by this RDMA device.

Can I register several MRs in my process?

Yes, you can.

Device Specific

Mellanox Technologies

ibv_reg_mr() failed, what is the reason for this?

If you are using one of the ConnectX HCAs family, it is a matter of configuration; you should increase the total size of memory that can be registered. This can be accomplished by setting the value of the parameter log_num_mtt of the module mlx4_core.

Adding the following line to the file /etc/modprobe.conf or to /etc/modprobe.d/mlx4_core.conf (depends on the Linux distribution that you are using) should solve this problem:
options mlx4_core log_num_mtt=24

Share Our Posts

Share this post through social bookmarks.

  • Delicious
  • Digg
  • Newsvine
  • RSS
  • StumbleUpon
  • Technorati

Comments

Tell us what do you think.

  1. Jagadeesh says: July 8, 2013

    Hi Dotan sir,
    i am very new to IB and uverbs coding, i have a small doubt, will the internal objects like QP, CQ and related handles uses pinned memory(locked memory) or not ?
    sorry if i asked silly question.
    thanks
    Jagadeesh.

    • Dotan Barak says: July 8, 2013

      Hi Jagadeesh.

      Welcome to the RDMA scene
      :)

      The answer is: yes. The internal Queues (which require space: such as QP, CQ, SRQ) are using pinned memory.

      Thanks
      Dotan

  2. Boris says: November 7, 2013

    Hello Dotan
    I want to ensure my assumption:
    Can the same memory address, be registered to more the one physical device simultaneously?

    Thanks.
    Boris.

    • Dotan Barak says: November 7, 2013

      Hi Boris.

      The resources of every RDMA device are completely separated.

      After saying that, there isn't any limitation to do it but keep in mind that there isn't
      any guarantee about the ordering of the access to this buffer by the devices and you need to take care of it in your code.

      Thanks
      Dotan

  3. Jiajun says: November 12, 2013

    Hi Dotan,

    I have a question about RDMA operations. In one subnet, if multiple processes on different machines register many MRs, is there any possibility that some 2 of these MRs have the same rkey field? If so, for a RDMA operation which specifies wr.rdma.rkey in its work request, how does Infiniband know which remote MR to send the data?

    I'm asking this question because I'm doing a small test and I find the above scenario.

    Thanks,
    Jiajun

    • Dotan Barak says: November 12, 2013

      Hi Jiajun.

      rkey is an attribute of an RDMA device:
      at a specific point in time, only one MR in that RDMA device can have this rkey value.

      If you are working with multiple devices in the same server, or with multiple servers in a subnet,
      you may get the same rkey value more than once.

      Since you are using the rkey in a Send Request and send it to a specific RDMA device (using the destination LID),
      this isn't problem and the RDMA protocol knows how to handle this.

      I hope that this answer was clear
      Dotan

      • Jiajun says: November 12, 2013

        Hi Dotan,

        That makes a lot of sense. Thanks.

        As I understand, in a subnet, LID along with QP number becomes the unique identifier of a queue pair, while LID along with rkey is the unique id of a memory region.

        When a RDMA read/write operation is performed via ibv_post_send(), the hardware will use dlid in the associated qp and rkey in wr.rdma.rkey to locate which remote MR to read from or write to. Is that correct?

        Another question, in the above case, will the dest_qp_num in the qp be useful? If not, the qp can read/write data to any MRs that belong to dlid, if appropriate flags have been set, right?

        Thanks,
        Jiajun

      • Dotan Barak says: November 15, 2013

        Hi Jiajun.

        Yes, in a subnet, a LID along with a QP number becomes a unique identifier of the QP.
        Lid along with rkey can be seen as a unique ID of a MR, but you connect using QPs and not using
        MRs...

        Yes, you are correct:
        When performing RDMA Write or Read, the DLID and remote QP number will be taken from the (local) QP context,
        and the remote RDMA device will use the rkey (that was posted in the SR) to understand which MR to use.

        Connected QPs can work with only one remote QP (hence "connected").
        You cannot change the remote QP number after you set this number when modify the QP INIT -> RTR.

        Any remote MR, with the right permissions, can be used with that remote QP as long as they share the same PD.

        Thanks
        Dotan

  4. Jagadeesh says: December 11, 2013

    Hi Dotan,

    When using kernel space verbs, before and after RDMA operations it is recommended to call ib_dma_sync*(OFED API) on buffers because of CPU cache.

    But while using in user level verbs there is no such option(cache operations) provided, and no one has faced such caching problems.

    Can you please help me to understand, in user space how cache coherence is maintained?

    Thanks & Regards
    Jagadeesh

    • Dotan Barak says: January 24, 2014

      Hi Jagadeesh.

      I'm sorry that it took me some time to answer your question
      (I had some technical problems to answer earlier).

      This is a great question. There was a mail thread this issue in the it in the linux-rdma mailing list;
      the mail thread subject was "(R)DMA in userspace" and in started on 11/10/2012 17:34.

      The bottom line is that those called (i.e. ib_dma_syn* calls) are mainly needed for non cache-coherent architecture.

      Most (today) machines are cache-coherent, so we don't hit any problem.
      However, if one will try to use user level verbs things expected to be broken
      (memory may not contain the expected content).

      I hope that I answered, you can find much more information in the mail thread above.
      Dotan

      • Jagadeesh says: February 17, 2014

        Hi Dotan,
        Thanks for your reply.
        The link helped to make things clear.

        Thanks & Regards
        Jagadeesh

      • Dotan Barak says: February 17, 2014

        This is great.

        Thanks for the feedback
        :)

        Dotan

  5. Igor R. says: January 21, 2014

    Hi Dotan,

    Should the memory address passed to ibv_mr_reg be page-aligned?

    Thanks!

    • Dotan Barak says: January 21, 2014

      Hi Igor.

      The memory address that is being registered doesn't have to be page-aligned.

      Thanks
      Dotan

  6. Igor R. says: March 2, 2014

    Hi Dotan,

    The latest OFED contains "peer memory" API, which is unfortunately not covered yet in your blog. Still, I hope I may ask a question on this subject.
    I'm using this new API to enable registration of a virtual memory region, which is actually mmapped from a PCI memory region - to enable RDMA Write to this PCI memory. IUUC, to enable such a functionality, one should implement all the callbacks provided in struct peer_memory_client, (as described in PEER_MEMORY_API.txt), including get_pages(). The question is how one should implement this function, considering the fact that there are no struct page's for PCI memory region.

    • Igor R. says: March 5, 2014

      Following my previous comment - actually I've figured out that it's enough to fill sg-entries with physical addresses and lengths, no need in struct page's.

      • Dotan Barak says: March 7, 2014

        Great, thanks for the update
        :)

        Dotan

  7. Eric L. says: July 2, 2014

    Hi Dotan,

    I got a question on registering a single memory region for multiple clients.

    I have a server which provides a memory region to be shared and accessed by different remote clients. Previously, I registered the MR when the server is set up(right after ibv_cm_id and PD are created). Both remote READ and WRITE operations worked, but atomic operation did not work. So I tried putting the registration process at the stage where the first client is establishing the connection(i.e. when the server receives a RDMA_CM_EVENT_CONNECT_REQUEST event). By doing this, the atomic operation worked. This confused me because to register a MR, only a PD and an associated allocated memory block are required. Why did the atomic operations fail with the first method since the PD does not change? And what is the right way to register a memory region for multiple clients? Do I need to register one MR for every client, or a single MR for all clients?

    Thanks!

    • Dotan Barak says: July 4, 2014

      Hi Eric.

      I must admit that I don't fully familiar with the librdmacm functionality.
      However, I'll try to shade some light...

      You can register a single MR for all clients, as long as all of them share the same PD.
      If there is a problem with the QPs after connecting them, I suggest that you'll perform query QP, and check the attributes:
      * qp_access_flags
      * max_rd_atomic
      * max_dest_rd_atomic

      Maybe the reason that atomic operations failed was that the QPs weren't configured to support it.
      If you can share the source code, maybe I can give you some more insights..

      I hope that this helps you
      Dotan

  8. Baturay O. says: July 15, 2014

    Hi Dotan,

    I'm using SoftiWarp. When I try to register with different sizes,ibv_reg_mr fails. Example, when I try 1kb, 4kb, 16kb to register, there is no problem. But when it is 64kb or more, I'm getting the error. What can be the reason for it?

    • Dotan Barak says: July 15, 2014

      Hi.

      Can you check using 'ulimit' how much memory your process can lock?
      (ulimit -l)

      Thanks
      Dotan

      • Baturay O. says: July 15, 2014

        Hi,

        It gives 64. What does it mean?

      • Dotan Barak says: July 15, 2014

        That your process can lock (i.e. pin) up to 64KB memory.
        I would suggest increasing this value if you want to work with RDMA...

        Thanks
        Dotan

      • Baturay O. says: July 15, 2014

        I solved the problem by configuring the /etc/security/limits.conf file.
        Thanks.

      • Dotan Barak says: July 15, 2014

        Yes.

        This is another way to change the amount memory which can be locked.

        I'm glad I could help you
        :)

        Thanks
        Dotan

  9. Rafi C says: November 13, 2014

    Hi Dotan,

    I was trying to implement an application that support rdma using
    librdmacm. When the process is trying to register the buffer using
    ibv_reg_mr, it failes after registering approximately 128 MB.

    As you suggested, I reconfigured options as

    1) cat /sys/module/mlx4_core/parameters/log_num_mtt
    24
    2) cat /sys/module/mlx4_core/parameters/log_mtts_per_seg
    7

    Also i have set value in /etc/security/limits.conf as

    * soft memlock unlimited
    * hard memlock unlimited

    So I hope that will allow us to pin unlimited memory with kernel. But for us, we can able to register approximately 128 MB. Further allocation is failed by returning NULL from ibv_reg_mr, by
    setting errno=11 (Resource temporarily unavailable).

    Am i missing something or am i doing wrong ?

    If you can point out any thing regarding this will be great.

    Thanks & Regards!
    Rafi KC

    • Dotan Barak says: November 13, 2014

      Hi Rafi.

      * Can you send me the output of "ulimit -l"?
      * If you can share the code, I can tell you what is wrong
      (assuming that the problem is with the code that you wrote).
      * Did you try to register lower amount of memory? in which size did it start to fail?
      * Did you check dmesg or /var/log/messages for errors?

      Thanks
      Dotan

      • Rafi C says: November 13, 2014

        1) Can you send me the output of "ulimit -l"?

        A) unlimited

        2) * If you can share the code, I can tell you what is wrong
        A) https://github.com/gluster/glusterfs/blob/master/rpc/rpc-transport/rdma/src/rdma.c line number 111
        3)* Did you try to register lower amount of memory? in which size did it start to fail?
        A) it successfully works for lower amount of memory, and fails at 128 MB (+1 tolerance ).

        4) * Did you check dmesg or /var/log/messages for errors?
        A)http://ur1.ca/iritw

        Regards
        Rafi KC

      • Dotan Barak says: November 13, 2014

        Hi Rafi KC.

        Which ibv_reg_mr failed?
        (there are several memory registration in your code).
        What is the total amount of memory that you registered?

        Please try to increase log_num_mtt (when loading the mlx4_core driver) and check if this helps.

        Thanks
        Dotan

  10. Max R says: February 19, 2015

    Hi Dotan,

    I'm using NVIDIAs GPUdirect technology and my system freezes when reading from GPU memory with ibverbs. Writing to it works all right. I'm wondering how ibv_reg_mr knows, that it should pin GPU memory. As far as i understand it, the nv_peer_mem kernel module registers the GPU memory as peer memory with ibverbs.
    But how does ibv_reg_mr "talk" with the gpu memory instead of the host memory? I've been looking through the source code for hours but can't figure it out.

    Any help would be appreciated!

    Regards,
    Max

    • Dotan Barak says: February 24, 2015

      Hi Max.

      There are two modules when working with GPU memory:
      1) driver that the GPU vendor provides (in your case, NVIDIA) which allows the kernel to access and map the GPU memory
      2) RDMA core that allows registration of memory

      From the RDMA core point of view, GPU memory is "regular" system memory that it can work with,
      there isn't any special code that handles this type of memory.

      I suggest that you'll install the latest driver from NVIDIA.

      Thanks
      Dotan

      • Igor R. says: February 26, 2015

        Just out of curiosity: doesn't GPU memory need to be registered in a kernel module via GPUDirect (PeerDirect) mechanism, to allow the subsequent MR registration? I guess, the regular MR registration wouldn't work, as get_user_pages() would fail for GPU memory...

      • Dotan Barak says: May 15, 2015

        Hi Igor.

        Sorry about the late response; I'm moderating all comments and this one entered the wrong category.

        Anyway, you are correct: a specific plugin (kernel module) should be written to register this GPU memory with the MLNX-OFED PeerDirect API, so that it can be used to detect and manage the memory actions related to his memory.

        Thanks
        Dotan

      • Max R says: March 2, 2015

        Hi Dotan,

        thanks for your help. It turned out that only specific mainboard chipsets support NVIDIA GPUdirect with Mellanox Infiniband, which is not documented anywhere. With the 4th hardware it finally works now.

        Regards,
        Max

      • Dotan Barak says: March 2, 2015

        Great, thanks for the update.
        Dotan

  11. Jimmy says: February 27, 2015

    Very informative article, thanks for keeping it updated!

    I've been trying to ibv_reg_mr a 4kb page of mmaped address from kernel memory, but it throws an EINVAL. A normal posix_memalign call generates an address that can be registered. I'm quite new to this and suspect I'm doing it completely wrong. What's the proper way to do zero-copy from userspace ib results to kernel memory? Thanks!

    p.s. the captcha doesn't seem to show up on mobile devices!

    • Dotan Barak says: February 27, 2015

      Hi Jimmy.

      Thanks for the worm words and about the captcha.
      :)

      In general, every memory which is accessible in a userspace process can be registered.

      I'm not fully understand what you are trying to do.
      Can you give some more background on what you tried to do and what didn't work?

      Thanks
      Dotan

      • Jimmy says: March 2, 2015

        I was using "get_zeroed_page" in kernel to allocate memory in kernel space and mapping it to userspace, and ibv_reg_mr throws an error on that.

        Today I instead tried "get_user_page" in kernel to map to memory allocated using posix_memalign in userspace. This one works for ibv_reg_mr, so I think I'll run with this :)

        So I guess if I were to use kernel allocated memory, I should be using kernel IB verbs?

      • Dotan Barak says: April 17, 2015

        Hi.

        I must admit that I'm not a kernel expert.

        AFAIK, the get_zeroed_page() is a special page in kernel: full with zeroes and has the COW indication.
        So, when one tries to write on it, it is copied in the virtual space of the process/module.

        What is the meaning of mapping this page to the userspace?
        allowing the userspace process to modify it?

        I suspect that this is the reason that you failed to map it to the userspace.
        Another option was to get a page in other kernel service function (for example: kzalloc())
        and map it to the userspace.

        This is a kernel issue and not an RDMA-related issue...

        Thanks
        Dotan

      • Santosh says: March 5, 2015

        Hi Dotan,

        Thanks for the all above replies, makes a lot of understanding about the RDMA.
        Is it possible to invoke the RDMA operation such as RDMA Send, RDMA Read, RDMA Write from another H/W component or it will be just the S/W interface only.

        TIA
        Santosh

      • Dotan Barak says: March 5, 2015

        Hi Santosh.

        Short answer : yes.

        Long answer : to enable an RDMA device, one needs to have a working low level driver. Assuming that you have it and the other HW component can perform PCI cycles to the RDMA device - the answer is yes.

  12. Santosh says: March 6, 2015

    Thanks Dotan,
    If the low level driver prepares the RDMA Queue pair and provides the Queue information to H/W component then the H/W components will know RDMA Queue Pair Configuration such as its base address, Q length, doorbell it can prepare, post the WQE and ring the doorbell for specific queue.

    If the above information is available to H/W then it can send and receive the PCIe TLPs.

    To do this kind of RDMA operation, what other information will be needed. Basically I am trying to interface the another H/W module with RDMA ASIC engine but swant to use the S/W less in data path.

    Thanks
    Santosh

    • Dotan Barak says: March 7, 2015

      Hi Santosh.

      In general it can be done. However:
      * Since you working with the low level driver directly with the HW, you need to have the HW specification/technical document.
      * The control part (which is still running in CPU) contains some state values of the device, you need to figure out how to make this synchronization.

      I give you here general comments from the knowledge I have.

      If you want to get into more details, I suggest that you'll contact the support of the HW vendor that you'll work with.

      Thanks
      Dotan

  13. Santosh says: March 8, 2015

    Thanks Dotan for the clarificaton.

  14. Sonpping says: January 24, 2016

    Hi Dotan,
    I am comfused about how dynamically allocated memory (using malloc() or mmap()) to be DMA-able. Does RNIC driver allocated a bounce buffer for user virtual memory(this virtual memory has been mapped to physical memory) registered? If so, data copy happens between user virtual memory and bounce buffer. when is this data copy process triggered?

    • Dotan Barak says: January 29, 2016

      Hi.

      I'm will describe what is going on in InfiniBand and RoCE (don't know about iWARP).

      The answer is no; the RDMA core (in the kernel level) translates the virtual buffer to the physical address of its pages,
      and Work Request access them directly.

      So, the RDMA device access directly the memory buffer hence zero copy.

      Thanks
      Dotan

  15. Junhyun says: July 25, 2016

    Hi Dotan, thanks for maintaining a very informative and helpful site.
    I have a simple question for you.
    Is it guaranteed that the member 'addr' of struct created by ibv_reg_mr is always equal to the addr passed in the argument?
    (It seems that way from the Mellanox examples, but just to be sure)

    • Dotan Barak says: July 29, 2016

      :)
      Thanks!

      The answer is yes, "addr" always points to the buffer register by the Memory Region
      (unless bad thing happened, and someone changed this value; a thing that shouldn't happen)

      Thanks
      Dotan

  16. Ranjit G M says: July 29, 2016

    Hi Dotan,

    I am relatively new to IB and RDMA. My question may seem silly, but please don't mind. In your FAQs you have said that the same memory region can be registered more than once. Say, for example, ibv_reg_mr() is called with the same parameters twice ie the addr and size are the same and also the pd and access field. In this case, what fields will vary when ibv_reg_mr() returns? I don't know if the lkey and rkey values differ since ibv_reg_mr() will be called twice. Just trying to understand what happens in this case.

    Thanks,
    Ranjit G M

    • Dotan Barak says: August 3, 2016

      Hi.

      Registering the same memory buffer twice (with same or different permissions) will end with two different Memory Region handles; every one of them will have unique lkey and rkey values.

      Thanks
      Dotan

      • Ranjit G M says: August 3, 2016

        Thanks for the answer!

  17. Lukas says: August 19, 2016

    Hi,
    As this function is quite expensive, I try to minimize memory registrations as much as possible. As far as I understand, ibv_reg_mr should check permissions. But consider the case

    int* data = new int[1000];
    auto mr = ibv_reg_mr(... data...);
    delete[] data;
    int* data2 = new int[1000];

    Assume that data2 == data (which is quite reasonable). I'd assume that in this case, mr can still be used, but this seems not to be the case.

    So: Is there a way to check if a memory region is valid?

    Kind regards

    • Dotan Barak says: August 20, 2016

      Hi Lukas.

      During the memory registration, the reference count on the memory pages is increased,
      so the physical memory pages still exist, even if you free the pages.
      The mapping virtual < -> physical from the RDMA device point of view is constant,
      until you'll reregister it or do any other manipulation to it in the RDMA stack.

      The fast that you freed this block and then allocating a new one and got the same address is not relevant
      to the fact that the physical pages for this buffer was changed.

      There isn't any way to check that a Memory Region is valid;
      if you registered it and didn't deregister/invalidate it - this Memory Region is still valid.

      Thanks
      Dotan

  18. Michael says: November 4, 2016

    Hello,

    I've a question about memory registration: consider a long (pinned) buffer and many different ibv_post_send (or ibv_post_recv) related to different part of this buffer (i.e. send(buffer+someValue, ...) , send(buffer+someOtherValue, ...), etc... ).

    Is it better to register only once (at the beginning) the entire buffer and use the same registered memory key for all the send (receive) operations?
    Or is it better to register a new memory region for each part of the buffer before a send (receive)?

    Best Regards

    • Dotan Barak says: November 8, 2016

      Hi.

      I believe that you are asking performance wise what is better;
      the answer is "depends on the behavior of your RDMA device".

      I would suggest to use one big Memory Region an use different parts of it,
      on demand
      (the management of it is easy
      +
      you will get many cache hits)

      Thanks
      Dotan

  19. masoud says: February 12, 2017

    Hi,
    I really got stock in memory registration. I am going to register 2G of memory but with different memory region (each one 16 bytes). More precisely, the following code:

    for (int i=0; i<xxxxxxx;i++)
    mr=ibv_reg_mr();

    but after 1000 times I get an error which can not allocate memory. I checked the configuration file it should allow me to allocate memory at least some GBs. I really appreciate it if you let me know if there is any solutions???

    Thank you so much!!!!

    • Dotan Barak says: February 13, 2017

      Hi.

      I suspect that the value of 'ulimit -l' (i.e. the amount of memory that can be pinned)
      is limited.

      Please check this and increase this limit.

      Thanks
      Dotan

      • masoud says: February 13, 2017

        Hi Dotan,

        Thank you for your response. I actually checked but it is set to unlimit. I should submit my job in a cluster then wait for a free computation node to execute my code. Do you suppose it can be the restriction of RDMA itself or OS?

      • Dotan Barak says: February 13, 2017

        Hi.

        It can be both:
        * lack of resources from the RDMA device
        * limitation of the OS itself

        Check what is the maximum buffer that you can lock;
        if it is ~ 32K or 64K, most likely it is environment (i.e. OS problem).

        Thanks
        Dotan

      • masoud says: February 13, 2017

        Can you please let me know how can I check the buffer that I can lock?

        When I increase the third parameter of ibv_reg_mr(), Size of the memory block to register, to 2GB it is OK and works but 2GB in different chunks(MR) does not work!

        Actually, my goal is to create a hash table and have direct access from another node. I dont know how can I manage this. Can I do it with one memory registration?

        Thank you,

      • Dotan Barak says: February 14, 2017

        Hi.

        You can use one big Memory Region and access if locally/remotely.
        For understand what is the problem in your setup, more information is required ...

        Thanks
        Dotan

      • masoud says: February 14, 2017

        I really appreciate your responses. I try to describe my problem in a small size, I hope it is clear:
        I allocate a memory region as follows:
        struct ibv_mr * tmp_mr;
        struct shared_data{
        int32_t Data;
        struct ibv_mr * next;
        };
        struct shared_data * data_region = ( struct shared_data *) malloc( sizeof(struct shared_data) );
        data_region->Data=1;
        data_region->next.addr=NULL;
        tmp_mr = ibv_reg_mr(s_ctx->pd, data_region, sizeof(struct shared_data) * 2 , IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_REMOTE_READ);

        In the ibv_reg_mr API, I allocate a double size of shared_data (sizeof(struct shared_data) * 2 ). Is it possible to access each one separately by memory region? Actually, if I allocate them separately in a large scale RDMA give me an error.

      • Dotan Barak says: February 15, 2017

        Hi.

        Maybe I'm missing something here,
        but data_region is the sizeof (struct shared_data), but you try to register twice of its size;
        i.e. you try to register memory which isn't allocated
        (i.e. not in the program virtual space).

        You can register two memory blocks, and make each of them have a different handle
        (and local/remote key).

        If you have a problem, maybe you can send me the code for review.

        Thanks
        Dotan

Add a Comment

Fill in the form and submit.

Time limit is exhausted. Please reload CAPTCHA.