Thread (45 messages) 45 messages, 6 authors, 2012-05-03

Re: [PATCH net-next] net: take care of cloned skbs in tcp_try_coalesce()

From: Eric Dumazet <hidden>
Date: 2012-05-02 16:46:28

On Wed, 2012-05-02 at 09:27 -0700, Alexander Duyck wrote:
Are you sure about that?  I think this may blow up if a bridge is
brought into play.  In that case you will have clones that will be going
through the xmit path of network drivers and I know in the case of the
older e1000 driver it didn't stop to look at the length but would
instead just go through and start mapping all frags to the device.  I am
fairly certain you are risking a data corruption any time you modify
nr_frags and dataref is != 1.

Hmm...

A driver should not map more fragments than len/data_len permits.
But point taken.

Frankly we can add the test, but it means that any sniffer running will
disable tcp coalescing, while net/packet/af_packet.c does the right
thing.

I'll check how I can do...
I really think what we should be doing is either not merge period, or we
have to go through slow paths if either the to or the from is cloned.
quoted
quoted
quoted
@@ -4515,7 +4521,12 @@ copyfrags:
 		offset = from->data - (unsigned char *)page_address(page);
 		skb_fill_page_desc(to, skb_shinfo(to)->nr_frags,
 				   page, offset, skb_headlen(from));
-		*fragstolen = true;
+
+		if (skb_cloned(from))
+			get_page(page);
+		else
+			*fragstolen = true;
+
 		delta = len; /* we dont know real truesize... */
 		goto copyfrags;
 	}
I don't see where we are now addressing the put_page call to release the
page afterwards.  By calling get_page you are incrementing the page
count by one, but where are you decrementing dataref in the shared
info?  Without that we are looking at a memory leak because __kfree_skb
will decrement the dataref but it will never reach 0 so it will never
call put_page on the head frag.
really the dataref was already incremented at skb_clone() time

It will be properly decremented since we call __kfree_skb()

Only the last decrement will perform the put_page()

Think about splice() is doing, its the same get_page() game.
I think you are missing the point.  So skb_clone will bump the dataref
to 2, calling get_page will bump the page count to 2.  After this
function you don't call __kfree_skb(skb) instead you call
kmem_cache_free(skbuff_head_cache, skb).  This will free the sk_buff,
but not decrement dataref leaving it at 2.  Eventually the raw socket
will call kfree_skb(skb) on the clone dropping the dataref to 1 and you
will call put_page dropping the page count to 1, but I don't see where
the last __kfree_skb call will come from that will drop dataref and the
page count to 0.
No, you miss that _if_ we added one to page count, then we wont call
kmem_cache_free(skbuff_head_cache, skb) but call __kfree_skb(skb)
instead because fragstolen will be false.

if (fragstolen)
	kmem_cache_free(...)
else
	__kfree_skb(...)

In future patch (addressing tcp coalescing in tcp_queue_rcv() as well),
I'll add a helper to make this more clear.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help