Thread (10 messages) 10 messages, 6 authors, 2021-10-15

Re: [Bug 214523] New: RDMA Mellanox RoCE drivers are unresponsive to ARP updates during a reconnect

From: Haakon Bugge <hidden>
Date: 2021-09-27 13:32:45

On 27 Sep 2021, at 15:10, Jason Gunthorpe [off-list ref] wrote:

On Mon, Sep 27, 2021 at 08:55:19PM +0800, Mark Zhang wrote:
quoted
On 9/27/2021 8:24 PM, Jason Gunthorpe wrote:
quoted
External email: Use caution opening links or attachments


On Mon, Sep 27, 2021 at 03:09:44PM +0300, Leon Romanovsky wrote:
quoted
On Sun, Sep 26, 2021 at 05:36:01PM +0000, Chuck Lever III wrote:
quoted
Hi Leon-

Thanks for the suggestion! More below.
quoted
On Sep 26, 2021, at 4:02 AM, Leon Romanovsky [off-list ref] wrote:

On Fri, Sep 24, 2021 at 03:34:32PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
quoted
https://bugzilla.kernel.org/show_bug.cgi?id=214523

           Bug ID: 214523
          Summary: RDMA Mellanox RoCE drivers are unresponsive to ARP
                   updates during a reconnect
          Product: Drivers
          Version: 2.5
   Kernel Version: 5.14
         Hardware: All
               OS: Linux
             Tree: Mainline
           Status: NEW
         Severity: normal
         Priority: P1
        Component: Infiniband/RDMA
         Assignee: drivers_infiniband-rdma@kernel-bugs.osdl.org
         Reporter: kolga@netapp.com
       Regression: No

RoCE RDMA connection uses CMA protocol to establish an RDMA connection. During
the setup the code uses hard coded timeout/retry values. These values are used
for when Connect Request is not being answered to to re-try the request. During
the re-try attempts the ARP updates of the destination server are ignored.
Current timeout values lead to 4+minutes long attempt at connecting to a server
that no longer owns the IP since the ARP update happens.

The ask is to make the timeout/retry values configurable via procfs or sysfs.
This will allow for environments that use RoCE to reduce the timeouts to a more
reasonable values and be able to react to the ARP updates faster. Other CMA
users (eg IB or others) can continue to use existing values.
I would rather not add a user-facing tunable. The fabric should
be better at detecting addressing changes within a reasonable
time. It would be helpful to provide a history of why the ARP
timeout is so lax -- do certain ULPs rely on it being long?
I don't know about ULPs and ARPs, but how to calculate TimeWait is
described in the spec.

Regarding tunable, I agree. Because it needs to be per-connection, most
likely not many people in the world will success to configure it properly.
Maybe we should be disconnecting the cm_id if a gratituous ARP changes
the MAC address? The cm_id is surely broken after that event right?
Is there an event on gratuitous ARP? And we also need to notify user-space
application, right?
I think there is a net notifier for this?
NETEVENT_NEIGH_UPDATE may be?


Thxs, Håkon
Userspace will see it via the CM event we'll need to trigger.

Jason
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help