Thread (27 messages) 27 messages, 6 authors, 2020-10-03

Re: [PATCH net-next 0/5] implement kthread based napi poll

From: Wei Wang <hidden>
Date: 2020-10-02 01:44:57

On Thu, Oct 1, 2020 at 4:46 PM Jakub Kicinski [off-list ref] wrote:
On Thu, 1 Oct 2020 15:12:20 -0700 Wei Wang wrote:
quoted
Yes. I did a round of testing with workqueue as well. The "real
workload" I mentioned is a google internal application benchmark which
involves networking  as well as disk ops.
There are 2 types of tests there.
1 is sustained tests, where the ops/s is being pushed to very high,
and keeps the overall cpu usage to > 80%, with various sizes of
payload.
In this type of test case, I see a better result with the kthread
model compared to workqueue in the latency metrics, and similar CPU
savings, with some tuning of the kthreads. (e.g., we limit the
kthreads to a pool of CPUs to run on, to avoid mixture with
application threads. I did the same for workqueue as well to be fair.)
Can you share relative performance delta of this banchmark?

Could you explain why threads are slower than ksoftirqd if you pin the
application away? From your cover letter it sounded like you want the
scheduler to see the NAPI load, but then you say you pinned the
application away from the NAPI cores for the test, so I'm confused.
No. We did not explicitly pin the application threads away.
Application threads are free to run anywhere. What we do is we
restrict the NAPI kthreads to only those CPUs handling rx interrupts.
(For us, 8 cpus out of 56.) So the load on those CPUs are very high
when running the test. And the scheduler is smart enough to avoid
using those CPUs for the application threads automatically.
Here is the results of 1 representative test result:
                     cpu/op   50%tile     95%tile       99%tile
base            71.47        417us      1.01ms          2.9ms
kthread         67.84       396us      976us            2.4ms
workqueue   69.68       386us      791us             1.9ms

Actually, I remembered it wrong. It does seem workqueue is doing
better on latencies. But cpu/op wise, kthread seems to be a bit
better.
quoted
The other is trace based tests, where the load is based on the actual
trace taken from the real servers. This kind of test has less load and
ops/s overall. (~25% total cpu usage on the host)
In this test case, I observe a similar amount of latency savings with
both kthread and workqueue, but workqueue seems to have better cpu
saving here, possibly due to less # of threads woken up to process the
load.

And one reason we would like to push forward with 1 kthread per NAPI,
is we are also trying to do busy polling with the kthread. And it
seems a good model to have 1 kthread dedicated to 1 NAPI to begin
with.
And you'd pin those busy polling threads to a specific, single CPU, too?
1 cpu : 1 thread : 1 NAPI?
Yes. That is my thought.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help