Re: [RFC PATCH net-next 1/2] net: Use SMP threads for backlog NAPI.

[RFC PATCH net-next 0/2] net: Use SMP threads for backlog NAPI. · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2023-08-14
[RFC PATCH 2/2] softirq: Drop the warning from do_softirq_post_smp_call_flush(). · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2023-08-14
Re: [RFC PATCH 2/2] softirq: Drop the warning from do_softirq_post_smp_call_flush(). · Jesper Dangaard Brouer <hawk@kernel.org> · 2023-08-15
Re: [RFC PATCH 2/2] softirq: Drop the warning from do_softirq_post_smp_call_flush(). · Yan Zhai <hidden> · 2023-08-15
Re: [RFC PATCH 2/2] softirq: Drop the warning from do_softirq_post_smp_call_flush(). · Jesper Dangaard Brouer <hawk@kernel.org> · 2023-08-16
Re: [RFC PATCH 2/2] softirq: Drop the warning from do_softirq_post_smp_call_flush(). · Yan Zhai <hidden> · 2023-08-16
Re: [RFC PATCH 2/2] softirq: Drop the warning from do_softirq_post_smp_call_flush(). · Jesper Dangaard Brouer <hidden> · 2023-08-16
Re: [RFC PATCH 2/2] softirq: Drop the warning from do_softirq_post_smp_call_flush(). · Yan Zhai <hidden> · 2023-08-18
Re: [RFC PATCH 2/2] softirq: Drop the warning from do_softirq_post_smp_call_flush(). · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2023-08-16
[RFC PATCH net-next 1/2] net: Use SMP threads for backlog NAPI. · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2023-08-14
Re: [RFC PATCH net-next 1/2] net: Use SMP threads for backlog NAPI. · kernel test robot <hidden> · 2023-08-21
Re: [RFC PATCH net-next 1/2] net: Use SMP threads for backlog NAPI. · Paolo Abeni <pabeni@redhat.com> · 2023-08-23
Re: [RFC PATCH net-next 1/2] net: Use SMP threads for backlog NAPI. · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2023-09-20
Re: [RFC PATCH net-next 1/2] net: Use SMP threads for backlog NAPI. · Ferenc Fejes <hidden> · 2023-09-21
Re: [RFC PATCH net-next 1/2] net: Use SMP threads for backlog NAPI. · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2023-09-22
Re: [RFC PATCH net-next 1/2] net: Use SMP threads for backlog NAPI. · Paolo Abeni <pabeni@redhat.com> · 2023-09-22
Re: [RFC PATCH net-next 0/2] net: Use SMP threads for backlog NAPI. · Jakub Kicinski <kuba@kernel.org> · 2023-08-14
Re: [RFC PATCH net-next 0/2] net: Use SMP threads for backlog NAPI. · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2023-08-17
Re: [RFC PATCH net-next 0/2] net: Use SMP threads for backlog NAPI. · Jakub Kicinski <kuba@kernel.org> · 2023-08-17
Re: [RFC PATCH net-next 0/2] net: Use SMP threads for backlog NAPI. · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2023-08-18
Re: [RFC PATCH net-next 0/2] net: Use SMP threads for backlog NAPI. · Yan Zhai <hidden> · 2023-08-18
Re: [RFC PATCH net-next 0/2] net: Use SMP threads for backlog NAPI. · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2023-08-18
Re: [RFC PATCH net-next 0/2] net: Use SMP threads for backlog NAPI. · Jakub Kicinski <kuba@kernel.org> · 2023-08-18
Re: [RFC PATCH net-next 0/2] net: Use SMP threads for backlog NAPI. · Eric Dumazet <edumazet@google.com> · 2023-08-18
Re: [RFC PATCH net-next 0/2] net: Use SMP threads for backlog NAPI. · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2023-08-23
Re: [RFC PATCH net-next 0/2] net: Use SMP threads for backlog NAPI. · Yan Zhai <hidden> · 2023-08-18

From: Ferenc Fejes <hidden>
Date: 2023-09-21 19:30:09
Also in: lkml

Hi!

On Wed, 2023-09-20 at 17:57 +0200, Sebastian Andrzej Siewior wrote:

On 2023-08-23 15:35:41 [+0200], Paolo Abeni wrote:

quoted

On Mon, 2023-08-14 at 11:35 +0200, Sebastian Andrzej Siewior wrote:

quoted

@@ -4781,7 +4733,7 @@ static int enqueue_to_backlog(struct

sk_buff *skb, int cpu,
 		 * We can use non atomic operation since we own
the queue lock
 		 */
 		if (!__test_and_set_bit(NAPI_STATE_SCHED, &sd-

quoted

backlog.state))

-			napi_schedule_rps(sd);
+			__napi_schedule_irqoff(&sd->backlog);
 		goto enqueue;
 	}
 	reason = SKB_DROP_REASON_CPU_BACKLOG;

I *think* that the above could be quite dangerous when cpu ==
smp_processor_id() - that is, with plain veth usage.

Currently, each packet runs into the rx path just after
enqueue_to_backlog()/tx completes.

With this patch there will be a burst effect, where the backlog
thread
will run after a few (several) packets will be enqueued, when the
process scheduler will decide - note that the current CPU is
already
hosting a running process, the tx thread.

The above can cause packet drops (due to limited buffering) or very
high latency (due to long burst), even in non overload situation,
quite
hard to debug.

I think the above needs to be an opt-in, but I guess that even RT
deployments doing some packet forwarding will not be happy with
this
on.

I've been looking at this again and have been thinking what you said
here. I think part of the problem is that we lack a policy/ mechanism
when a DoS is happening and what to do.

Before commit d15121be74856 ("Revert "softirq: Let ksoftirqd do its
job"") when a lot of network packets are processed then processing is
moved to ksoftirqd and continues based on how the scheduler schedules
the SCHED_OTHER ksoftirqd task. This avoids lock-ups of the system
and
it can do something else in between. Any interrupt will not continue
the
outstanding softirq backlog but wait for ksoftirqd. So it basically
avoids the networking overload. It throttles the throughput if
needed.

This isn't the case after that commit. Now, the CPU can be stuck with
processing networking packets if the packets come in fast enough.
Even
if ksoftirqd is woken up, the next interrupt (say the timer) will
continue with at least one round.
By using NAPI-threads it is possible to give the control back to the
scheduler which can throttle the NAPI processing in favour of other
threads that ask for CPU. As you pointed out, waking the thread does
not
guarantee that it will immediately do the NAPI work. It can be
delayed
based on current load on the system.

This could be influenced by assigning the NAPI-thread a SCHED_FIFO
priority. Based on the priority it could be ensured that the thread
starts right away or "later" if something else is more important.
However, this opens the DoS window again: The scheduler will put the
NAPI thread on CPU as long as it asks for it with no throttling.

If we could somehow define a DoS condition once we are overwhelmed
with
packets, then we could act on it and throttle it. This in turn would
allow a SCHED_FIFO priority without the fear of a lockup if the
system
is flooded with packets.

Can this be avoided if we reuse gro_flush_timeout as the maximum time
the NAPI thread can be scheduled?

quoted

Cheers,

Paolo

Sebastian

Ferenc

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help