Thread (8 messages) 8 messages, 2 authors, 2023-02-17

Re: [RFC v2] net/core: add optional threading for rps backlog processing

From: Felix Fietkau <nbd@nbd.name>
Date: 2023-02-17 13:40:52
Also in: lkml

On 17.02.23 13:57, Eric Dumazet wrote:
On Fri, Feb 17, 2023 at 1:35 PM Felix Fietkau [off-list ref] wrote:
quoted
On 17.02.23 13:23, Eric Dumazet wrote:
quoted
On Fri, Feb 17, 2023 at 11:06 AM Felix Fietkau [off-list ref] wrote:
quoted
When dealing with few flows or an imbalance on CPU utilization, static RPS
CPU assignment can be too inflexible. Add support for enabling threaded NAPI
for RPS backlog processing in order to allow the scheduler to better balance
processing. This helps better spread the load across idle CPUs.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
---

RFC v2:
 - fix rebase error in rps locking
Why only deal with RPS ?

It seems you propose the sofnet_data backlog be processed by a thread,
instead than from softirq ?
Right. I originally wanted to mainly improve RPS, but my patch does
cover backlog in general. I will update the description in the next
version. Does the approach in general make sense to you?
I do not know, this seems to lack some (perf) numbers, and
descriptions of added max latencies and stuff like that :)
I just ran some test where I used a MT7621 device (dual-core 800 MHz
MIPS, 4 threads) as a router doing NAT without flow offloading.

Using the flent RRUL test between 2 PCs connected through the router,
I get these results:

rps_threaded=0: (combined CPU idle time around 27%)
                              avg       median       99th %          # data pts
  Ping (ms) ICMP   :        26.08        28.70        54.74 ms              199
  Ping (ms) UDP BE :         1.96        24.12        37.28 ms              200
  Ping (ms) UDP BK :         1.88        15.86        27.30 ms              200
  Ping (ms) UDP EF :         1.98        31.77        54.10 ms              200
  Ping (ms) avg    :         1.94          N/A          N/A ms              200
  TCP download BE  :        69.25        70.20       139.55 Mbits/s         200
  TCP download BK  :        95.15        92.51       163.93 Mbits/s         200
  TCP download CS5 :       133.64       129.10       292.46 Mbits/s         200
  TCP download EF  :       129.86       127.70       254.47 Mbits/s         200
  TCP download avg :       106.97          N/A          N/A Mbits/s         200
  TCP download sum :       427.90          N/A          N/A Mbits/s         200
  TCP totals       :       864.43          N/A          N/A Mbits/s         200
  TCP upload BE    :        97.54        96.67       163.99 Mbits/s         200
  TCP upload BK    :       139.76       143.88       190.37 Mbits/s         200
  TCP upload CS5   :        97.52        94.70       206.60 Mbits/s         200
  TCP upload EF    :       101.71       106.72       147.88 Mbits/s         200
  TCP upload avg   :       109.13          N/A          N/A Mbits/s         200
  TCP upload sum   :       436.53          N/A          N/A Mbits/s         200

rps_threaded=1: (combined CPU idle time around 16%)
                              avg       median       99th %          # data pts
  Ping (ms) ICMP   :        13.70        16.10        27.60 ms              199
  Ping (ms) UDP BE :         2.03        18.35        24.16 ms              200
  Ping (ms) UDP BK :         2.03        18.36        29.13 ms              200
  Ping (ms) UDP EF :         2.36        25.20        41.50 ms              200
  Ping (ms) avg    :         2.14          N/A          N/A ms              200
  TCP download BE  :       118.69       120.94       160.12 Mbits/s         200
  TCP download BK  :       134.67       137.81       177.14 Mbits/s         200
  TCP download CS5 :       126.15       127.81       174.84 Mbits/s         200
  TCP download EF  :        78.36        79.41       143.31 Mbits/s         200
  TCP download avg :       114.47          N/A          N/A Mbits/s         200
  TCP download sum :       457.87          N/A          N/A Mbits/s         200
  TCP totals       :       918.19          N/A          N/A Mbits/s         200
  TCP upload BE    :       112.20       111.55       164.38 Mbits/s         200
  TCP upload BK    :       144.99       139.24       205.12 Mbits/s         200
  TCP upload CS5   :        93.09        95.50       132.39 Mbits/s         200
  TCP upload EF    :       110.04       108.21       207.00 Mbits/s         200
  TCP upload avg   :       115.08          N/A          N/A Mbits/s         200
  TCP upload sum   :       460.32          N/A          N/A Mbits/s         200

As you can see, both throughput and latency improve because load can be
better distributed across CPU cores.

- Felix
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help