Thread (31 messages) 31 messages, 10 authors, 2009-03-20

Re: High contention on the sk_buff_head.lock

From: Eric Dumazet <hidden>
Date: 2009-03-19 05:49:50
Also in: linux-rt-users, lkml

Possibly related (same subject, not in this thread)

David Miller a écrit :
From: Sven-Thorsten Dietrich <redacted>
Date: Wed, 18 Mar 2009 18:43:27 -0700
quoted
Do we have to rule-out per-CPU queues, that aggregate into a master
queue in a batch-wise manner? 
That would violate the properties and characteristics expected by
the packet scheduler, wrt. to fair based fairness, rate limiting,
etc.

The only legal situation where we can parallelize to single device
is where only the most trivial packet scheduler is attached to
the device and the device is multiqueue, and that is exactly what
we do right now.
I agree with you David.

Still, there is room for improvements, since :

1) default qdisc is pfifo_fast. This beast uses three sk_buff_head (96 bytes)
  where it could use 3 smaller list_head (3 * 16 = 48 bytes on x86_64)

 (assuming sizeof(spinlock_t) is only 4 bytes, but it's more than that
 on various situations (LOCKDEP, ...)

2) struct Qdisc layout could be better, letting read mostly fields
   at beginning of structure. (ie move 'dev_queue', 'next_sched', reshape_fail,
   u32_node, __parent, ...)

  'struct gnet_stats_basic' has a 32 bits hole

   'gnet_stats_queue' could be split, at least in Qdisc, so that three
   seldom use fields (drops, requeues, overlimits) go in a different cache line.

   gnet_stats_rate_est might be also moved in a 'not very used' cache line, if
   I am not mistaken ?

3) In stress situation a CPU A queues a skb to a sk_buff_head, but a CPU B
   dequeues it to feed device, involving an expensive cache line miss
   on the skb.{next|prev} (to set them to NULL)

   We could:
      Use a special dequeue op that doesnt touch skb.{next|prev}
   Eventually set next/prev to NULL after q.lock is released



--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help