Re: qdisc_enqueue, NET_XMIT_SUCCESS and kfree_skb (Was: Re: [PATCH take 2] net_sched: Add qdisc __NET_XMIT_BYPASS flag)
From: Jarek Poplawski <hidden>
Date: 2008-08-06 21:52:23
On Wed, Aug 06, 2008 at 10:42:48PM +0300, Jussi Kivilinna wrote: ...
Ok, I went throught all enqueue (and requeue) functions for any case of freeing skb and returning full NET_XMIT_SUCCESS without new flags and found only in sch_blackhole (qdisc_drop + return NET_XMIT_SUCCESS).
Very interesting observation. Probably mostly theoretical (I wonder how many people use this). There is a question if this code can be returned in such a case? noop returns NET_XMIT_CN, which looks safer, but maybe this is an exception? I don't know. Anyway, if it happens e.g. with forwarded skb it looks like reading after kfree.
quoted hunk ↗ jump to hunk
This could be fixed by delaying kfree_skb to exit on qdisc_enqueue_root, here's (completely untested) patch: ---diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index a7abfda..ca083c6 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h@@ -175,6 +175,7 @@ struct tcf_proto struct qdisc_skb_cb { unsigned int pkt_len; + __u8 delayed_enqueue_free:1; char data[]; };@@ -364,10 +365,23 @@ static inline int qdisc_enqueue(struct sk_buff*skb, struct Qdisc *sch) return sch->enqueue(skb, sch); } +static inline void qdisc_delayed_kfree_skb(struct sk_buff *skb) +{ + qdisc_skb_cb(skb)->delayed_enqueue_free = 1; +} + static inline int qdisc_enqueue_root(struct sk_buff *skb, struct Qdisc *sch) { + int ret; + + qdisc_skb_cb(skb)->delayed_enqueue_free = 0; qdisc_skb_cb(skb)->pkt_len = skb->len; - return qdisc_enqueue(skb, sch) & NET_XMIT_MASK; + ret = qdisc_enqueue(skb, sch); + + if (ret == NET_XMIT_SUCCESS && qdisc_skb_cb(skb)->delayed_enqueue_free) + kfree_skb(skb); + + return ret & NET_XMIT_MASK; } static inline int __qdisc_enqueue_tail(struct sk_buff *skb, struct Qdisc *sch,diff --git a/net/sched/sch_blackhole.c b/net/sched/sch_blackhole.c index 507fb48..13230bd 100644 --- a/net/sched/sch_blackhole.c +++ b/net/sched/sch_blackhole.c@@ -19,7 +19,8 @@ static int blackhole_enqueue(struct sk_buff *skb, struct Qdisc *sch) { - qdisc_drop(skb, sch); + qdisc_delayed_kfree_skb(skb); + sch->qstats.drops++; return NET_XMIT_SUCCESS; } ---If this isn't good way to solve this, qdisc_pkt_len use for stats could be fixed with either passing packet length pointer throught qdisc tree or adding new qdisc_pkt_len_diff and adding difference in at dequeue as you said (but here inner dequeue could return NULL and difference wouldn't be added after all but well it is just stats).
I doubt that such a rare case should change the way all packets are treated, but if so, there probably could be used one of these new __NET_XMIT flags for this.
As I went throught code I found two cases where skb pointer is used after inner enqueue with full NET_XMIT_SUCCESS (other than qdisc_pkt_len for stats): HTB uses skb_is_gso(), HFSC uses packet length for set_active(). HTB is trivial (for me) to fix while HFSC isn't. Because HFSC part it would be easier for me to declare full NET_XMIT_SUCCESS as safe zone for skb pointer.
I guess some wiser guys should decide how serious problem it is.
- Jussi PS. I noticed something fishy in HTB; HTB always returns NET_XMIT_DROP if qdisc_enqueue doesn't return full NET_XMIT_SUCCESS, shouldn't it return return value from qdisc_enqueue. Same in HTB requeue. That can't be right, right?
Yes, very good point, and quite hard to diagnose bug - happily solved already (but not fixed yet) by David Miller himself. Jarek P. PS: it seems your mailer wrapped some lines of above patch.