Thread (105 messages) 105 messages, 13 authors, 2008-11-24

Re: [PATCH 2/2] udp: RCU handling for Unicast packets.

From: Eric Dumazet <hidden>
Date: 2008-10-30 05:51:02

Corey Minyard a écrit :
Eric Dumazet wrote:
quoted
Paul E. McKenney a écrit :
quoted
On Wed, Oct 29, 2008 at 09:00:13PM +0100, Eric Dumazet wrote:
quoted
Hum... Another way of handling all those cases and avoid memory 
barriers
would be to have different "NULL" pointers.

Each hash chain should have a unique "NULL" pointer (in the case of 
UDP, it
can be the 128 values : [ (void*)0 .. (void *)127 ]

Then, when performing a lookup, a reader should check the "NULL" 
pointer
it get at the end of its lookup has is the "hash" value of its chain.

If not -> restart the loop, aka "goto begin;" :)

We could avoid memory barriers then.

In the two cases Corey mentioned, this trick could let us avoid 
memory barriers.
(existing one in sk_add_node_rcu(sk, &hslot->head); should be enough)

What do you think ?
Kinky!!!  ;-)

Then the rcu_dereference() would be supplying the needed memory 
barriers.

Hmmm...  I guess that the only confusion would be if the element got
removed and then added to the same list.  But then if its pointer was
pseudo-NULL, then that would mean that all subsequent elements had been
removed, and all preceding ones added after the scan started.

Which might well be harmless, but I must defer to you on this one at
the moment.

If you need a larger hash table, another approach would be to set the
pointer's low-order bit, allowing the upper bits to be a full-sized
index -- or even a pointer to the list header.  Just make very sure
to clear the pointer when freeing, or an element on the freelist
could end up looking like a legitimate end of list...  Which again
might well be safe, but why inflict this on oneself?
Well, for UDP case, hash table will be <= 65536 anyway, we can assume
no dynamic kernel memory is in the range [0 .. 65535]

Here is a patch (untested yet, its really time for a sleep for me ;) )

[PATCH] udp: Introduce special NULL pointers for hlist termination

In order to safely detect changes in chains, we would like to have 
different
'NULL' pointers. Each chain in hash table is terminated by an unique 
'NULL'
value, so that the lockless readers can detect their lookups evaded from
their starting chain.

We define 'NULL' values as ((unsigned long)values < UDP_HTABLE_SIZE)

This saves memory barriers (a read barrier to fetch 'next' pointers
*before* checking key values) we added in commit 
96631ed16c514cf8b28fab991a076985ce378c26 (udp: introduce 
sk_for_each_rcu_safenext())
This also saves a missing memory barrier spotted by Corey Minyard (a 
write one in udp_lib_get_port(), between sk_hash update and ->next 
update)
I think you are right, this will certainly perform a lot better without 
the read barrier in the list traversal.  I haven't seen any problems 
with this approach, though it's unusual enough to perhaps warrant some 
extra comments in the code.

You do need to modify udp_lib_unhash(), as sk_del_node_init_rcu() will 
do a NULL check on the ->next value, so you will need a special version 
of that as well.
Yes, we need many new macros, like sk_next_nulls(), sk_head_nulls(), ...

I have a working patch now, but not yet presentable for lkml :)

This patch need to touch files outside of netdev scope, so will need
really good shape and documentation.

(Probably a new file : include/linux/list_nulls.h ?)

Maybe in the meantime, we can commit a temporary patch doing the smp_wmb()
you suggested ?

Thanks

[PATCH] udp: add a missing smp_wmb() in udp_lib_get_port()

Corey Minyard spotted a missing memory barrier in udp_lib_get_port()

We need to make sure a reader cannot read the new 'sk->sk_next' value
and previous value of 'sk->sk_hash'. Or else, an item could be deleted
from a chain, and inserted into another chain. If new chain was empty
before the move, 'next' pointer is NULL, and lockless reader can
not detect it missed following items in original chain.

This patch is temporary, since we expect an upcoming patch
to introduce another way of handling the problem.

Signed-off-by: Eric Dumazet <redacted>

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help