[PATCH] net: release skb->dst in sock_queue_rcv_skb()
From: Eric Dumazet <hidden>
Date: 2008-11-26 00:00:39
David Miller a écrit :
From: Eric Dumazet <redacted> Date: Tue, 25 Nov 2008 05:43:32 +0100quoted
Very interesting. So we could try the following path : 1) First try to release dst when queueing skb to various queues (UDP, TCP, ...) while its hot. Reader wont have to release it while its cold. 2) Check if we can handle the input path without any refcount dirtying ? To make the transition easy, we could use a bit on skb to mark dst being not refcounted (ie no dst_release() should be done on it)It is possible to make this self-auditing. For example, by using the usual trick where we encode a pointer in an unsigned long and use the low bits for states. In the first step, make each skb->dst access go through some accessor inline function. Next, audit the paths where skb->dst's can "escape" the pure packet input path. Add annotations, in the form of a inline function call, for these locations. Also, audit the other locations where we enqueue into a socket queue and no longer care about the skb->dst, and annotate those with another inline function. Finally, the initial skb->dst assignment in the input path doesn't grab a reference, but sets the low bit ("refcount pending") in the encoded skb->dst pointer. The skb->dst "escape" inline function performs the deferred refcount grab. And kfree_skb() is taught to not dst_release() on skb->dst's which have the low bit set. Anyways, something like that.
I looked this stuff and found it would be difficult to not grab a
reference (and more important not writing to dst) in input path.
ip_rcv_finish() calls ip_route_input()
and ip_route_input() calls dst_use(&rth->u.dst, jiffies);
static inline void dst_use(struct dst_entry *dst, unsigned long time)
{
dst_hold(dst);
dst->__use++;
dst->lastuse = time;
}
Even if we avoid the refcount increment, I guess we need the lastuse
assignement in order to keep dst in cache. Not sure about the role of
__use field. Hum... for a tcp connection, dst refcount should already
be pinned by a sk->sk_dst_cache. Maybe test refcount value, and if this
value is > 1, dont take a reference. (given rcu_read_lock() is done
before calling ip_rcv_finish())
In the meantime, what do you think of the following patch ?
[PATCH] net: release skb->dst in sock_queue_rcv_skb()
When queuing a skb to sk->sk_receive_queue, we can release its dst, not
anymore needed.
Since current cpu did the dst_hold(), refcount is probably still hot
int this cpu caches.
This avoids readers to access the original dst to decrement its refcount,
possibly a long time after packet reception. This should speedup UDP
and RAW receive path.
Signed-off-by: Eric Dumazet <redacted> Attachments
- sock_queue_rcv_skb.patch [text/plain] 534 bytes · preview