Thread (31 messages) 31 messages, 10 authors, 2009-03-20

Re: High contention on the sk_buff_head.lock

From: David Miller <davem@davemloft.net>
Date: 2009-03-19 05:58:49
Also in: linux-rt-users, lkml

Possibly related (same subject, not in this thread)

From: Eric Dumazet <redacted>
Date: Thu, 19 Mar 2009 06:49:19 +0100
Still, there is room for improvements, since :

1) default qdisc is pfifo_fast. This beast uses three sk_buff_head (96 bytes)
  where it could use 3 smaller list_head (3 * 16 = 48 bytes on x86_64)

 (assuming sizeof(spinlock_t) is only 4 bytes, but it's more than that
 on various situations (LOCKDEP, ...)
I already plan on doing this, skb->{next,prev} will be replaced with a
list_head and nearly all of the sk_buff_head usage will simply
disappear.  It's a lot of work because every piece of SKB queue
handling code has to be sanitized to only use the interfaces in
linux/skbuff.h and lots of extremely ugly code like the PPP
defragmenter make many non-trivial direct skb->{next,prev}
manipulations.
2) struct Qdisc layout could be better, letting read mostly fields
   at beginning of structure. (ie move 'dev_queue', 'next_sched', reshape_fail,
   u32_node, __parent, ...)
I have no problem with your struct layout changes, submit it formally.
3) In stress situation a CPU A queues a skb to a sk_buff_head, but a CPU B
   dequeues it to feed device, involving an expensive cache line miss
   on the skb.{next|prev} (to set them to NULL)

   We could:
      Use a special dequeue op that doesnt touch skb.{next|prev}
   Eventually set next/prev to NULL after q.lock is released
You absolutely can't do this, as it would break GSO/GRO.  The whole
transmit path is littered with checks of skb->next being NULL for the
purposes of segmentation handling.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help