Thread (16 messages) 16 messages, 5 authors, 2013-05-31

Re: 3.10.0-rc2 mlx4 not receiving packets for some multicast groups

From: Shawn Bohrer <hidden>
Date: 2013-05-30 20:31:24

On Wed, May 29, 2013 at 04:55:32PM +0300, Or Gerlitz wrote:
On Tue, May 28, 2013 at 11:15 PM, Shawn Bohrer [off-list ref] wrote:
quoted
Naturally I was wrong and we set more than the above non-default
values.  We additionally set high_rate_steer=1 on mlx4_core. As
you may know this parameter isn't currently available in the upstream
driver, so I've been carrying the following patch in my 3.4 and 3.10 trees:
[...]
quoted
I've confirmed that with the above high_rate_steer patch and
high_rate_steer=1 I receive data on 3.10.0-rc3 and with
high_rate_steer=0 I only receive data on a small number of multicast
addresses.  With 3.4 and the same patch I receive data in both cases.
[...]

Shawn, so end-in mind you want the NIC steering mode to be DMFS
(Device Managed Flow Steering) e.g for the processes bypassing the
kernel, correct? since the NIC steering mode is global, you will not
be able to use that non-upstream patch moving forward.
Yes, end goal is to use DMFS.  However, we have some ConnectX-2 cards
which I guess do not support DMFS and naturally I'd like plain old UDP
multicast to continue to work at the same level as 3.4.  So I may
still want that high_rate_steer option upstreamed, but we'll see once
I get 3.10 into better shape.
So we need to
debug/bisect why without the patch (what you call high_rate_steer=0)
you don't get data on all groups. Can you bisect that on a single
node, e.g set the rest of the environment with 3.4 that works, and on
a given node see what is the commit that breaks that?
Done. It appears that the patch that breaks receiving packets on many
different multicast groups/sockets is:

commit 4cd729b04285b7330edaf5a7080aa795d6d15ff3
Author: Vlad Yasevich [off-list ref]
Date:   Mon Apr 15 09:54:25 2013 +0000

    net: add dev_uc_sync_multiple() and dev_mc_sync_multiple() api
    
    The current implementation of dev_uc_sync/unsync() assumes that there is
    a strict 1-to-1 relationship between the source and destination of the sync.
    In other words, once an address has been synced to a destination device, it
    will not be synced to any other device through the sync API.
    However, there are some virtual devices that aggreate a number of lower
    devices and need to sync addresses to all of them.  The current
    API falls short there.
    
    This patch introduces a new dev_uc_sync_multiple() api that can be called
    in the above circumstances and allows sync to work for every invocation.
    
    CC: Jiri Pirko [off-list ref]
    Signed-off-by: Vlad Yasevich [off-list ref]
    Signed-off-by: David S. Miller [off-list ref]

I've confirmed that reverting this patch on top of 3.10-rc3 allows me
to receive packets on all of my multicast groups without the Mellanox
high_rate_steer option set.

--
Shawn
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help