Thread (37 messages) 37 messages, 7 authors, 2024-03-02

Re: [net-next V3 15/15] Documentation: networking: Add description for multi-pf netdev

From: "Samudrala, Sridhar" <sridhar.samudrala@intel.com>
Date: 2024-02-23 01:23:41


On 2/22/2024 5:00 PM, Jakub Kicinski wrote:
On Thu, 22 Feb 2024 08:51:36 +0100 Greg Kroah-Hartman wrote:
quoted
On Tue, Feb 20, 2024 at 05:33:09PM -0800, Jakub Kicinski wrote:
quoted
Greg, we have a feature here where a single device of class net has
multiple "bus parents". We used to have one attr under class net
(device) which is a link to the bus parent. Now we either need to add
more or not bother with the linking of the whole device. Is there any
precedent / preference for solving this from the device model
perspective?
How, logically, can a netdevice be controlled properly from 2 parent
devices on two different busses?  How is that even possible from a
physical point-of-view?  What exact bus types are involved here?
Two PCIe buses, two endpoints, two networking ports. It's one piece
Isn't it only 1 networking port with multiple PFs?
of silicon, tho, so the "slices" can talk to each other internally.
The NVRAM configuration tells both endpoints that the user wants
them "bonded", when the PCI drivers probe they "find each other"
using some cookie or DSN or whatnot. And once they did, they spawn
a single netdev.
quoted
This "shouldn't" be possible as in the end, it's usually a PCI device
handling this all, right?
It's really a special type of bonding of two netdevs. Like you'd bond
two ports to get twice the bandwidth. With the twist that the balancing
is done on NUMA proximity, rather than traffic hash.

Well, plus, the major twist that it's all done magically "for you"
in the vendor driver, and the two "lower" devices are not visible.
You only see the resulting bond.

I personally think that the magic hides as many problems as it
introduces and we'd be better off creating two separate netdevs.
And then a new type of "device bond" on top. Small win that
the "new device bond on top" can be shared code across vendors.
Yes. We have been exploring a small extension to bonding driver to 
enable a single numa-aware multi-threaded application to efficiently 
utilize multiple NICs across numa nodes.

Here is an early version of a patch we have been trying and seems to be 
working well.

=========================================================================
bonding: select tx device based on rx device of a flow

If napi_id is cached in the sk associated with skb, use the
device associated with napi_id as the transmit device.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
diff --git a/drivers/net/bonding/bond_main.c 
b/drivers/net/bonding/bond_main.c
index 7a7d584f378a..77e3bf6c4502 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -5146,6 +5146,30 @@ static struct slave 
*bond_xmit_3ad_xor_slave_get(struct bonding *bond,
         unsigned int count;
         u32 hash;

+       if (skb->sk) {
+               int napi_id = skb->sk->sk_napi_id;
+               struct net_device *dev;
+               int idx;
+
+               rcu_read_lock();
+               dev = dev_get_by_napi_id(napi_id);
+               rcu_read_unlock();
+
+               if (!dev)
+                       goto hash;
+
+               count = slaves ? READ_ONCE(slaves->count) : 0;
+               if (unlikely(!count))
+                       return NULL;
+
+               for (idx = 0; idx < count; idx++) {
+                       slave = slaves->arr[idx];
+                       if (slave->dev->ifindex == dev->ifindex)
+                               return slave;
+               }
+       }
+
+hash:
         hash = bond_xmit_hash(bond, skb);
         count = slaves ? READ_ONCE(slaves->count) : 0;
         if (unlikely(!count))
=========================================================================

If we make this as a configurable bonding option, would this be an 
acceptable solution to accelerate numa-aware apps?
But there's only so many hours in the day to argue with vendors.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help