Thread (105 messages) 105 messages, 13 authors, 2008-11-24

Re: [PATCH 2/2] udp: RCU handling for Unicast packets.

From: Paul E. McKenney <hidden>
Date: 2008-10-29 18:11:17

On Wed, Oct 29, 2008 at 06:32:29PM +0100, Eric Dumazet wrote:
Paul E. McKenney a écrit :
quoted
On Wed, Oct 29, 2008 at 05:09:53PM +0100, Eric Dumazet wrote:
quoted
Corey Minyard a écrit :
quoted
Eric Dumazet wrote:
quoted
Corey Minyard found a race added in commit 
271b72c7fa82c2c7a795bc16896149933110672d
(udp: RCU handling for Unicast packets.)

"If the socket is moved from one list to another list in-between the 
time  the hash is calculated and the next field is accessed, and the 
socket  has moved to the end of the new list, the traversal will not 
complete  properly on the list it should have, since the socket will be 
on the end  of the new list and there's not a way to tell it's on a new 
list and  restart the list traversal.  I think that this can be solved 
by  pre-fetching the "next" field (with proper barriers) before 
checking the  hash."

This patch corrects this problem, introducing a new 
sk_for_each_rcu_safenext()
macro.
You also need the appropriate smp_wmb() in udp_lib_get_port() after 
sk_hash is set, I think, so the next field is guaranteed to be changed 
after the hash value is changed.
Not sure about this one Corey.

If a reader catches previous value of item->sk_hash, two cases are to be 
taken into :

1) its udp_hashfn(net, sk->sk_hash) is != hash   -> goto begin : Reader 
will redo its scan

2) its udp_hashfn(net, sk->sk_hash) is == hash
 -> next pointer is good enough : it points to next item in same hash 
chain.
    No need to rescan the chain at this point.
    Yes we could miss the fact that a new port was bound and this UDP 
message could be lost.
3) its udp_hashfn(net, sk-sk_hash) is == hash, but only because it was
removed, freed, reallocated, and then readded with the same hash value,
possibly carrying the reader to a new position in the same list.
yes, but 'new position' is 'before any not yet examined objects', since
we insert objects only at chain head.
OK.  However, this reasoning assumes that a socket with a given
udp_hashfn() value will appear on one and only one list.  There are no
side lists for sockets in other states?  (listen, &c)
quoted
You might well cover this (will examine your code in detail on my plane
flight starting about 20 hours from now), but thought I should point it
out.  ;-)
Yes, I'll double check too, this seems tricky :)
;-)
About SLAB_DESTROY_BY_RCU effect, we now have two different kmem_cache for 
"UDP-Lite"
and "UDP".

This is expected, but we could avoid that and alias these caches, since
these objects have the same *type* . (The fields used for the RCU lookups,
deletes and inserts are the same)

Maybe a hack in net/ipv4/udplite.c before calling proto_register(), to
copy the kmem_cache from UDP.
As long as this preserves the aforementioned assumption that a socket
with a given hash can appear on one and only one list.  ;-)

							Thanx, Paul
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help