Re: [RFC PATCH net-next 1/2] net: Use SMP threads for backlog NAPI.
From: Ferenc Fejes <hidden>
Date: 2023-09-21 19:30:09
Also in:
lkml
Hi! On Wed, 2023-09-20 at 17:57 +0200, Sebastian Andrzej Siewior wrote:
On 2023-08-23 15:35:41 [+0200], Paolo Abeni wrote:quoted
On Mon, 2023-08-14 at 11:35 +0200, Sebastian Andrzej Siewior wrote:quoted
@@ -4781,7 +4733,7 @@ static int enqueue_to_backlog(structsk_buff *skb, int cpu, * We can use non atomic operation since we own the queue lock */ if (!__test_and_set_bit(NAPI_STATE_SCHED, &sd-quoted
backlog.state))- napi_schedule_rps(sd); + __napi_schedule_irqoff(&sd->backlog); goto enqueue; } reason = SKB_DROP_REASON_CPU_BACKLOG;I *think* that the above could be quite dangerous when cpu == smp_processor_id() - that is, with plain veth usage. Currently, each packet runs into the rx path just after enqueue_to_backlog()/tx completes. With this patch there will be a burst effect, where the backlog thread will run after a few (several) packets will be enqueued, when the process scheduler will decide - note that the current CPU is already hosting a running process, the tx thread. The above can cause packet drops (due to limited buffering) or very high latency (due to long burst), even in non overload situation, quite hard to debug. I think the above needs to be an opt-in, but I guess that even RT deployments doing some packet forwarding will not be happy with this on.I've been looking at this again and have been thinking what you said here. I think part of the problem is that we lack a policy/ mechanism when a DoS is happening and what to do. Before commit d15121be74856 ("Revert "softirq: Let ksoftirqd do its job"") when a lot of network packets are processed then processing is moved to ksoftirqd and continues based on how the scheduler schedules the SCHED_OTHER ksoftirqd task. This avoids lock-ups of the system and it can do something else in between. Any interrupt will not continue the outstanding softirq backlog but wait for ksoftirqd. So it basically avoids the networking overload. It throttles the throughput if needed. This isn't the case after that commit. Now, the CPU can be stuck with processing networking packets if the packets come in fast enough. Even if ksoftirqd is woken up, the next interrupt (say the timer) will continue with at least one round. By using NAPI-threads it is possible to give the control back to the scheduler which can throttle the NAPI processing in favour of other threads that ask for CPU. As you pointed out, waking the thread does not guarantee that it will immediately do the NAPI work. It can be delayed based on current load on the system. This could be influenced by assigning the NAPI-thread a SCHED_FIFO priority. Based on the priority it could be ensured that the thread starts right away or "later" if something else is more important. However, this opens the DoS window again: The scheduler will put the NAPI thread on CPU as long as it asks for it with no throttling. If we could somehow define a DoS condition once we are overwhelmed with packets, then we could act on it and throttle it. This in turn would allow a SCHED_FIFO priority without the fear of a lockup if the system is flooded with packets.
Can this be avoided if we reuse gro_flush_timeout as the maximum time the NAPI thread can be scheduled?
quoted
Cheers, PaoloSebastian
Ferenc