Re: [PATCH net-next 1/1] net/sched: Disambiguate verdict from return code
From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: 2023-10-02 19:55:03
Also in:
bpf
Subsystem:
networking [general], the rest · Maintainers:
"David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Linus Torvalds
Hi Daniel, On Fri, Sep 29, 2023 at 11:48 AM Daniel Borkmann [off-list ref] wrote:
On 9/26/23 1:01 AM, Jamal Hadi Salim wrote:quoted
On Fri, Sep 22, 2023 at 4:12 AM Daniel Borkmann [off-list ref] wrote:quoted
On 9/20/23 1:20 AM, Jamal Hadi Salim wrote:quoted
On Tue, Sep 19, 2023 at 6:15 PM Daniel Borkmann [off-list ref] wrote:quoted
On 9/19/23 4:59 PM, Victor Nogueira wrote:[...]quoted
quoted
In the above case we don't have 'internal' errors which you want to trace, so I would also love to avoid the cost of zeroing struct tcf_result res which should be 3x 8b for every packet.We can move the zeroing inside tc_run() but we declare it in the same spot as we do right now. You will still need to set res.verdict as above. Would that work for you?What I'm not following is that with the below you can avoid the unnecessary fast path cost (which is only for corner case which is almost never hit) and get even better visibility. Are you saying it doesn't work?
I am probably missing something: -1/UNSPEC is a legit errno. And the main motivation here for this patch is to disambiguate if it was -EPERM vs UNSPEC Maybe that is what you are calling a "corner case"? There are two options in my mind right now (since you are guaranteed in tcx_run you will never return anything below UNSPEC): 1) we just have the switch statement invocation inside an inline function and you can pass it sch_ret (for tcx case) and we'll pass it res.verdit for tc_run() case. 2) is something is we leave tcx_run alone and we have something along the lines of: --------------
diff --git a/net/core/dev.c b/net/core/dev.c
index 1450f4741d9b..93613bce647c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c@@ -3985,7 +3985,7 @@ sch_handle_ingress(struct sk_buff *skb, structpacket_type **pt_prev, int *ret,
struct net_device *orig_dev, bool *another)
{
struct bpf_mprog_entry *entry =
rcu_dereference_bh(skb->dev->tcx_ingress);
- struct tcf_result res = {0};
+ struct tcf_result res;
int sch_ret;
if (!entry)@@ -4003,14 +4003,16 @@ sch_handle_ingress(struct sk_buff *skb, structpacket_type **pt_prev, int *ret,
if (sch_ret != TC_ACT_UNSPEC)
goto ingress_verdict;
}
+
+ res.verdict = 0;
sch_ret = tc_run(tcx_entry(entry), skb, &res);
if (sch_ret < 0) {
kfree_skb_reason(skb, SKB_DROP_REASON_TC_INGRESS_ERROR);
*ret = NET_RX_DROP;
return NULL;
}
+ sch_ret = res.verdict;
ingress_verdict:
- switch (res.verdict) {
+ switch (sch_ret) {
case TC_ACT_REDIRECT:
/* skb_mac_header check was done by BPF, so we can
safely
* push the L2 header back before redirecting to another
-----------
on the drop reason - our thinking is to support drop_watch alongside
tracepoint given kfree_skb_reason exists already; if i am not mistaken
what you suggested would require us to create a new tracepoint?
cheers,
jamal
quoted
quoted
I was more thinking like something below could be a better choice. I presume your main goal is to trace where these errors originated in the first place, so it might even be useful to capture the actual return code as well.The main motivation is a few syzkaller bugs which resulted in not disambiguating between errors being returned and sometimes TC_ACT_SHOT.quoted
Then you can use perf script, bpf and whatnot to gather further insights into what happened while being less invasive and avoiding the need to extend struct tcf_result.We could use trace instead - the reason we have the skb reason is being used in the other spots (does this trace require ebpf to be usable?).No you can just use regular perf by attaching to the tracepoint, no need for using bpf at all here.quoted
quoted
This would be quite similar to trace_xdp_exception() as well, and I think you can guarantee that in fast path all errors are < TC_ACT_UNSPEC anyway.I am not sure i followed. 0 means success, result codes are returned in res now.What I was saying is that you don't need the struct change from the patch, but only the changes where you rework TC_ACT_SHOT into one of the -E<errors>, and then with the below you can pass this through an exception tracepoint.quoted
quoted
diff --git a/net/core/dev.c b/net/core/dev.c index 85df22f05c38..4089d195144d 100644 --- a/net/core/dev.c +++ b/net/core/dev.c@@ -3925,6 +3925,10 @@ static int tc_run(struct tcx_entry *entry, struct sk_buff *skb) mini_qdisc_bstats_cpu_update(miniq, skb); ret = tcf_classify(skb, miniq->block, miniq->filter_list, &res, false); + if (unlikely(ret < TC_ACT_UNSPEC)) { + trace_tc_exception(skb->dev, skb->tc_at_ingress, ret); + ret = TC_ACT_SHOT; + } /* Only tcf related quirks below. */ switch (ret) { case TC_ACT_SHOT:Thanks, Daniel