qdisc_enqueue, NET_XMIT_SUCCESS and kfree_skb (Was: Re: [PATCH take 2] net_sched: Add qdisc __NET_XMIT_BYPASS flag)
From: Jussi Kivilinna <hidden>
Date: 2008-08-06 19:42:51
Subsystem:
networking [general], tc subsystem, the rest · Maintainers:
"David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jamal Hadi Salim, Jiri Pirko, Linus Torvalds
Quoting "Jarek Poplawski" [off-list ref]:
quoted
How about making skb shared before passing into qdisc tree? That would make skb usage safe after qdisc enqueues.It's a bit costly (atomics), so there should be a good reason for this. It should be first checked if there is real danger. And if it's only for more exact stats, I'm not sure it's worth of it.
Ok, I went throught all enqueue (and requeue) functions for any case of freeing skb and returning full NET_XMIT_SUCCESS without new flags and found only in sch_blackhole (qdisc_drop + return NET_XMIT_SUCCESS). This could be fixed by delaying kfree_skb to exit on qdisc_enqueue_root, here's (completely untested) patch: ---
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a7abfda..ca083c6 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h@@ -175,6 +175,7 @@ struct tcf_proto struct qdisc_skb_cb { unsigned int pkt_len; + __u8 delayed_enqueue_free:1; char data[]; };
@@ -364,10 +365,23 @@ static inline int qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch)
return sch->enqueue(skb, sch);
}
+static inline void qdisc_delayed_kfree_skb(struct sk_buff *skb)
+{
+ qdisc_skb_cb(skb)->delayed_enqueue_free = 1;
+}
+
static inline int qdisc_enqueue_root(struct sk_buff *skb, struct Qdisc *sch)
{
+ int ret;
+
+ qdisc_skb_cb(skb)->delayed_enqueue_free = 0;
qdisc_skb_cb(skb)->pkt_len = skb->len;
- return qdisc_enqueue(skb, sch) & NET_XMIT_MASK;
+ ret = qdisc_enqueue(skb, sch);
+
+ if (ret == NET_XMIT_SUCCESS &&
qdisc_skb_cb(skb)->delayed_enqueue_free)
+ kfree_skb(skb);
+
+ return ret & NET_XMIT_MASK;
}
static inline int __qdisc_enqueue_tail(struct sk_buff *skb, struct
Qdisc *sch,diff --git a/net/sched/sch_blackhole.c b/net/sched/sch_blackhole.c
index 507fb48..13230bd 100644
--- a/net/sched/sch_blackhole.c
+++ b/net/sched/sch_blackhole.c@@ -19,7 +19,8 @@ static int blackhole_enqueue(struct sk_buff *skb, struct Qdisc *sch) { - qdisc_drop(skb, sch); + qdisc_delayed_kfree_skb(skb); + sch->qstats.drops++; return NET_XMIT_SUCCESS; } ---
If this isn't good way to solve this, qdisc_pkt_len use for stats could be fixed with either passing packet length pointer throught qdisc tree or adding new qdisc_pkt_len_diff and adding difference in at dequeue as you said (but here inner dequeue could return NULL and difference wouldn't be added after all but well it is just stats). As I went throught code I found two cases where skb pointer is used after inner enqueue with full NET_XMIT_SUCCESS (other than qdisc_pkt_len for stats): HTB uses skb_is_gso(), HFSC uses packet length for set_active(). HTB is trivial (for me) to fix while HFSC isn't. Because HFSC part it would be easier for me to declare full NET_XMIT_SUCCESS as safe zone for skb pointer. - Jussi PS. I noticed something fishy in HTB; HTB always returns NET_XMIT_DROP if qdisc_enqueue doesn't return full NET_XMIT_SUCCESS, shouldn't it return return value from qdisc_enqueue. Same in HTB requeue. That can't be right, right?