Thread (33 messages) 33 messages, 7 authors, 2018-09-10

Re: [PATCH net-next] virtio_net: force_napi_tx module param.

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: 2018-07-24 22:01:48

On Tue, Jul 24, 2018 at 2:39 PM Michael S. Tsirkin [off-list ref] wrote:
On Tue, Jul 24, 2018 at 10:01:39AM -0400, Willem de Bruijn wrote:
quoted
On Tue, Jul 24, 2018 at 6:44 AM Michael S. Tsirkin [off-list ref] wrote:
quoted
On Mon, Jul 23, 2018 at 04:11:19PM -0700, Caleb Raitto wrote:
quoted
From: Caleb Raitto <redacted>

The driver disables tx napi if it's not certain that completions will
be processed affine with tx service.

Its heuristic doesn't account for some scenarios where it is, such as
when the queue pair count matches the core but not hyperthread count.

Allow userspace to override the heuristic. This is an alternative
solution to that in the linked patch. That added more logic in the
kernel for these cases, but the agreement was that this was better left
to user control.

Do not expand the existing napi_tx variable to a ternary value,
because doing so can break user applications that expect
boolean ('Y'/'N') instead of integer output. Add a new param instead.

Link: https://patchwork.ozlabs.org/patch/725249/
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jon Olson <redacted>
Signed-off-by: Caleb Raitto <redacted>
Is there a reason the same rule should apply to all devices?
If not shouldn't this be an ethtool option?
It not very likely that a guest will have multiple virtio_net devices,
some of which have affinity_hint_set and some do not?
Just to answer this question, I do hear a lot about guests with multiple
virtio net interfaces.  These might be just the noisy few, but they do
exist.
quoted
I'd really rather not add the extra option at all, but remove
the affinity_hint_set requirement for now. Without more data,
I understand the concern about cacheline bouncing if napi-tx
would becomes the default at some point and we don't have
data on this by then. But while it isn't default and a user has to
opt in to napi-tx to try it that seems enough guardrail to me.

The original reason was lack of data on whether napi-tx may suffer
from cache invalidations when tx and rx softirq are on different cpus
and we enable tx descriptor cleaning from the rx handler (i.e., on ACK).
quoted
From those initial numbers it seemed to be a win even with those
invalidations.

  https://patchwork.ozlabs.org/patch/746232/

In lieu of removing the affinity_hint_set, this boolean is the least amount
of both new interface and implementation to allow experimentation. We
can easily leave it as a noop eventually when we are confident that
napi-tx can be enabled even without affinity. By comparison, an ethtool
param would be quite a bit of new logic.
So it boils down to this: if we believe napi tx is
always a good idea, then this flag is there just in case,
and the patch is fine. If it's good for some workloads
but not others, and will be for a while, we are better off
with an ethtool flag.

What's the case here?
Based on benchmark results so far, it looks like the first. It's markedly
better for some and in the noise for most.

The crux is the "for a while". We undoubtedly will find some cases that
we need to fix before flipping the default.

The choice is between using napi_tx vs adding a separate ethtool function
to do essentially the same. This patch is the simpler option, and easier to
backport to guest kernels.
OTOH if you want to add more trick to the affinity hint, such
as the mentioned above # of queues matching core count,
that is also fine IMHO.
From the above linked patch, I understand that there are yet
other special cases in production, such as a hard cap on #tx queues to
32 regardless of number of vcpus. I don't think we want to add special
cases for all this kind of business logic in the kernel.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help