Thread (44 messages) 44 messages, 6 authors, 2008-08-19

qdisc_enqueue, NET_XMIT_SUCCESS and kfree_skb (Was: Re: [PATCH take 2] net_sched: Add qdisc __NET_XMIT_BYPASS flag)

From: Jussi Kivilinna <hidden>
Date: 2008-08-06 19:42:51
Subsystem: networking [general], tc subsystem, the rest · Maintainers: "David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jamal Hadi Salim, Jiri Pirko, Linus Torvalds

Quoting "Jarek Poplawski" [off-list ref]:
quoted
How about making skb shared before passing into qdisc tree?
That would make skb usage safe after qdisc enqueues.
It's a bit costly (atomics), so there should be a good reason for this.
It should be first checked if there is real danger. And if it's only
for more exact stats, I'm not sure it's worth of it.
Ok, I went throught all enqueue (and requeue) functions for any case of
freeing skb and returning full NET_XMIT_SUCCESS without new flags and
found only in sch_blackhole (qdisc_drop + return NET_XMIT_SUCCESS).
This could be fixed by delaying kfree_skb to exit on qdisc_enqueue_root,
here's (completely untested) patch:
---
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a7abfda..ca083c6 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -175,6 +175,7 @@ struct tcf_proto

  struct qdisc_skb_cb {
         unsigned int            pkt_len;
+       __u8                    delayed_enqueue_free:1;
         char                    data[];
  };
@@ -364,10 +365,23 @@ static inline int qdisc_enqueue(struct sk_buff  
*skb, struct Qdisc *sch)
         return sch->enqueue(skb, sch);
  }

+static inline void qdisc_delayed_kfree_skb(struct sk_buff *skb)
+{
+       qdisc_skb_cb(skb)->delayed_enqueue_free = 1;
+}
+
  static inline int qdisc_enqueue_root(struct sk_buff *skb, struct Qdisc *sch)
  {
+       int ret;
+
+       qdisc_skb_cb(skb)->delayed_enqueue_free = 0;
         qdisc_skb_cb(skb)->pkt_len = skb->len;
-       return qdisc_enqueue(skb, sch) & NET_XMIT_MASK;
+       ret = qdisc_enqueue(skb, sch);
+
+       if (ret == NET_XMIT_SUCCESS &&  
qdisc_skb_cb(skb)->delayed_enqueue_free)
+               kfree_skb(skb);
+
+       return ret & NET_XMIT_MASK;
  }

  static inline int __qdisc_enqueue_tail(struct sk_buff *skb, struct  
Qdisc *sch,
diff --git a/net/sched/sch_blackhole.c b/net/sched/sch_blackhole.c
index 507fb48..13230bd 100644
--- a/net/sched/sch_blackhole.c
+++ b/net/sched/sch_blackhole.c
@@ -19,7 +19,8 @@

  static int blackhole_enqueue(struct sk_buff *skb, struct Qdisc *sch)
  {
-       qdisc_drop(skb, sch);
+       qdisc_delayed_kfree_skb(skb);
+       sch->qstats.drops++;
         return NET_XMIT_SUCCESS;
  }
---
If this isn't good way to solve this, qdisc_pkt_len use for stats could be
fixed with either passing packet length pointer throught qdisc tree or adding
new qdisc_pkt_len_diff and adding difference in at dequeue as you said  
(but here
inner dequeue could return NULL and difference wouldn't be added after all but
well it is just stats).

As I went throught code I found two cases where skb pointer is used  
after inner
enqueue with full NET_XMIT_SUCCESS (other than qdisc_pkt_len for stats): HTB
uses skb_is_gso(), HFSC uses packet length for set_active(). HTB is trivial
(for me) to fix while HFSC isn't. Because HFSC part it would be easier for me
to declare full NET_XMIT_SUCCESS as safe zone for skb pointer.

  - Jussi

PS. I noticed something fishy in HTB; HTB always returns NET_XMIT_DROP if
qdisc_enqueue doesn't return full NET_XMIT_SUCCESS, shouldn't it return return
value from qdisc_enqueue. Same in HTB requeue. That can't be right, right?
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help