Re: [PATCH net] net/sched: fq_pie: avoid stalls in fq_pie_timer()

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: 2023-08-30 15:50:33

On Wed, Aug 30, 2023 at 11:00 AM Eric Dumazet [off-list ref] wrote:

On Wed, Aug 30, 2023 at 4:48 PM Jamal Hadi Salim [off-list ref] wrote:

quoted

On Wed, Aug 30, 2023 at 10:30 AM Michal Kubiak [off-list ref] wrote:

quoted

On Tue, Aug 29, 2023 at 12:35:41PM +0000, Eric Dumazet wrote:

quoted

When setting a high number of flows (limit being 65536),
fq_pie_timer() is currently using too much time as syzbot reported.

Add logic to yield the cpu every 2048 flows (less than 150 usec
on debug kernels).
It should also help by not blocking qdisc fast paths for too long.
Worst case (65536 flows) would need 31 jiffies for a complete scan.

Relevant extract from syzbot report:

rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 0-.... } 2663 jiffies s: 873 root: 0x1/.
rcu: blocking rcu_node structures (internal RCU debug):
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 5177 Comm: syz-executor273 Not tainted 6.5.0-syzkaller-00453-g727dbda16b83 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline]
RIP: 0010:write_comp_data+0x21/0x90 kernel/kcov.c:236
Code: 2e 0f 1f 84 00 00 00 00 00 65 8b 05 01 b2 7d 7e 49 89 f1 89 c6 49 89 d2 81 e6 00 01 00 00 49 89 f8 65 48 8b 14 25 80 b9 03 00 <a9> 00 01 ff 00 74 0e 85 f6 74 59 8b 82 04 16 00 00 85 c0 74 4f 8b
RSP: 0018:ffffc90000007bb8 EFLAGS: 00000206
RAX: 0000000000000101 RBX: ffffc9000dc0d140 RCX: ffffffff885893b0
RDX: ffff88807c075940 RSI: 0000000000000100 RDI: 0000000000000001
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9000dc0d178
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000555555d54380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6b442f6130 CR3: 000000006fe1c000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <NMI>
 </NMI>
 <IRQ>
 pie_calculate_probability+0x480/0x850 net/sched/sch_pie.c:415
 fq_pie_timer+0x1da/0x4f0 net/sched/sch_fq_pie.c:387
 call_timer_fn+0x1a0/0x580 kernel/time/timer.c:1700

Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler")
Link: https://lore.kernel.org/lkml/00000000000017ad3f06040bf394@google.com/ (local)
Reported-by: syzbot+e46fbd5289363464bc13@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>

The code logic and style looks good to me.
However, I don't have experience with that code to estimate if 2048
flows per round is enough to avoid stalls for all normal circumstances,
so I guess someone else should take a look.

Eric, I had the same question: Why 2048 (why not 12 for example? ;->).
Could that number make more sense to add as an init attribute? Not
asking you to add it but i or somebody else could send a followup
patch after.

This is based on experimentation.

I started using 1024, then saw that using 2048 was okay.

I think I gave some numbers in the changelog :
"(less than 150 usec on debug kernels)."

Spending 150 usec every jiffie seems reasonable to me.

Honestly, I am not sure if anyone was/is using a high number of flows,
given that whole qdisc enqueue/dequeue operations were frozen every 15ms for
2 or 3 ms on non debug kernels :/

Unfortunately such numbers tend to depend on the CPU used etc. Once
your patch goes in we can add an extension to set a netlink attribute
so the user can change this value (by default keep it at 2048).

cheers,
jamal

quoted

Other than that:

Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

quoted

Thanks,
Reviewed-by: Michal Kubiak <redacted>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help