Re: [RFC] net: store port/representative id in metadata_dst
From: Jakub Kicinski <hidden>
Date: 2016-09-29 11:10:16
On Fri, 23 Sep 2016 14:20:40 -0700, John Fastabend wrote:
On 16-09-23 01:45 PM, Jakub Kicinski wrote:quoted
On Fri, 23 Sep 2016 13:25:10 -0700, John Fastabend wrote:quoted
On 16-09-23 01:17 PM, Jakub Kicinski wrote:quoted
On Fri, 23 Sep 2016 10:22:59 -0700, Samudrala, Sridhar wrote:quoted
On 9/23/2016 8:29 AM, Jakub Kicinski wrote:[...] [...]quoted
quoted
The 'accel' parameter in dev_queue_xmit_accel() is currently only passed to ndo_select_queue() via netdev_pick_tx() and is used to select the tx queue. Also, it is not passed all the way to the driver specific xmit routine. Doesn't it require changing all the driver xmit routines if we want to pass this parameter?[...]quoted
quoted
Yes. The VFPR netdevs don't have any HW queues associated with them and we would like to use the PF queues for the xmit. I was also looking into some way of passing the port id via skb parameter to the dev_queue_xmit() call so that the PF xmit routine can do a directed transmit to a specifc VF. Is skb->cb an option to pass this info? dst_metadata approach would work too if it is acceptable.I don't think we can trust skb->cb to be set to anything meaningful when the skb is received by the lower device.Agreed. I wouldn't recommend using skb->cb. How about passing it through dev_queue_xmit_accel() through to the driver? If you pass the metadata through the dev_queue_xmit_accel() handle tx queue selection would work using normal mechanisms (xps, select_queue, cls hook, etc.). If you wanted to pick some specific queue based on policy the policy could be loaded into one of those hooks.Do you mean without extending how accel is handled by dev_queue_xmit_accel() today? If my goal is to not have extra HW queues then I don't see how I could mux in the lower dev without extra locking (as I tried to explain two emails ago). Sorry for being slow here :(Not slow here I think I was overly optimistic... Yeh let me try this, roughly the current flow is, dev_queue_xmit_accel(struct sk_buff *skb, void *accel_priv); __dev_queue_xmit(skb, accel_priv); netdev_pick_tx(dev, skb, accel_priv); ndo_select_queue(dev, skb, accel_priv, ...); [...] q->enqueue(); [...] dev_hard_start_xmit(); [...] <driver code here> So in this flow the VFR netdev driver handles its xmit routine by calling dev_queue_xmit_accel after setting skb->dev to the physical device and passing a cookie via accel that the select_queue() routine can use to pick a tx queue. The rest of the stack q->enqueue() and friends will ensure that locking and qdisc is handled correctly. But accel_priv was lost at queue selection and so its not being passed down to the driver so no way to set your descriptor bits or whatever needed to push to the VF. I was sort of thinking we could map it from the select_queue routine but I can't figure out how to do that either. The metadata idea doesn't seem that bad now that I've spent some more time going through it. Either that or hijack some field in the skb but I think that might be worse than the proposal here. I'm trying to think up some other alternative now and will let you know if I think of anything clever but got nothing at the moment.
Cool, I'm happy to discuss this further at netdev but it seems like there is no strong opposition so far? FWIW in the example I gave I didn't do refcounting on the dst but I think that's incorrect since we don't have control over lifetime of redirected/stolen skbs.