Thread (12 messages) 12 messages, 6 authors, 2016-09-29

Re: [RFC] net: store port/representative id in metadata_dst

From: Jakub Kicinski <hidden>
Date: 2016-09-29 11:10:16

On Fri, 23 Sep 2016 14:20:40 -0700, John Fastabend wrote:
On 16-09-23 01:45 PM, Jakub Kicinski wrote:
quoted
On Fri, 23 Sep 2016 13:25:10 -0700, John Fastabend wrote:  
quoted
On 16-09-23 01:17 PM, Jakub Kicinski wrote:  
quoted
On Fri, 23 Sep 2016 10:22:59 -0700, Samudrala, Sridhar wrote:    
quoted
On 9/23/2016 8:29 AM, Jakub Kicinski wrote:    
 [...]  
 [...]    
quoted
quoted
The 'accel' parameter in dev_queue_xmit_accel() is currently only passed
to ndo_select_queue() via netdev_pick_tx() and is used to select the tx 
queue.
Also, it is not passed all the way to the driver specific xmit routine.  
Doesn't it require
changing all the driver xmit routines if we want to pass this parameter?
   
 [...]    
quoted
quoted
Yes.  The VFPR netdevs don't have any HW queues associated with them and 
we would like
to use the PF queues for the xmit.
I was also looking into some way of passing the port id via skb 
parameter to the
dev_queue_xmit() call so that the PF xmit routine can do a directed 
transmit to a specifc VF.
Is skb->cb an option to pass this info?
dst_metadata approach would work  too if it is acceptable.    
I don't think we can trust skb->cb to be set to anything meaningful
when the skb is received by the lower device.   
Agreed. I wouldn't recommend using skb->cb. How about passing it through
dev_queue_xmit_accel() through to the driver?

If you pass the metadata through the dev_queue_xmit_accel() handle tx
queue  selection would work using normal mechanisms (xps, select_queue,
cls  hook, etc.). If you wanted to pick some specific queue based on
policy the policy could be loaded into one of those hooks.  
Do you mean without extending how accel is handled by
dev_queue_xmit_accel() today?  If my goal is to not have extra HW
queues then I don't see how I could mux in the lower dev without extra
locking (as I tried to explain two emails ago).  Sorry for being slow
here :(
  
Not slow here I think I was overly optimistic...

Yeh let me try this, roughly the current flow is,

   dev_queue_xmit_accel(struct sk_buff *skb, void *accel_priv);
   __dev_queue_xmit(skb, accel_priv);
   netdev_pick_tx(dev, skb, accel_priv);
	ndo_select_queue(dev, skb, accel_priv, ...);
   [...]
   q->enqueue();
   [...]
   dev_hard_start_xmit();
   [...]
    <driver code here>

So in this flow the VFR netdev driver handles its xmit routine by
calling dev_queue_xmit_accel after setting skb->dev to the physical
device and passing a cookie via accel that the select_queue() routine
can use to pick a tx queue. The rest of the stack q->enqueue() and
friends will ensure that locking and qdisc is handled correctly.

But accel_priv was lost at queue selection and so its not being passed
down to the driver so no way to set your descriptor bits or whatever
needed to push to the VF. I was sort of thinking we could map it from
the select_queue routine but I can't figure out how to do that either.

The metadata idea doesn't seem that bad now that I've spent some more
time going through it. Either that or hijack some field in the skb but
I think that might be worse than the proposal here.

I'm trying to think up some other alternative now and will let you know
if I think of anything clever but got nothing at the moment.
	
Cool, I'm happy to discuss this further at netdev but it seems like
there is no strong opposition so far?

FWIW in the example I gave I didn't do refcounting on the dst but I
think that's incorrect since we don't have control over lifetime of
redirected/stolen skbs.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help