Re: [PATCH net-next] virtio_net: force_napi_tx module param.
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: 2018-07-24 22:01:48
On Tue, Jul 24, 2018 at 2:39 PM Michael S. Tsirkin [off-list ref] wrote:
On Tue, Jul 24, 2018 at 10:01:39AM -0400, Willem de Bruijn wrote:quoted
On Tue, Jul 24, 2018 at 6:44 AM Michael S. Tsirkin [off-list ref] wrote:quoted
On Mon, Jul 23, 2018 at 04:11:19PM -0700, Caleb Raitto wrote:quoted
From: Caleb Raitto <redacted> The driver disables tx napi if it's not certain that completions will be processed affine with tx service. Its heuristic doesn't account for some scenarios where it is, such as when the queue pair count matches the core but not hyperthread count. Allow userspace to override the heuristic. This is an alternative solution to that in the linked patch. That added more logic in the kernel for these cases, but the agreement was that this was better left to user control. Do not expand the existing napi_tx variable to a ternary value, because doing so can break user applications that expect boolean ('Y'/'N') instead of integer output. Add a new param instead. Link: https://patchwork.ozlabs.org/patch/725249/ Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Jon Olson <redacted> Signed-off-by: Caleb Raitto <redacted>Is there a reason the same rule should apply to all devices? If not shouldn't this be an ethtool option?It not very likely that a guest will have multiple virtio_net devices, some of which have affinity_hint_set and some do not?Just to answer this question, I do hear a lot about guests with multiple virtio net interfaces. These might be just the noisy few, but they do exist.quoted
I'd really rather not add the extra option at all, but remove the affinity_hint_set requirement for now. Without more data, I understand the concern about cacheline bouncing if napi-tx would becomes the default at some point and we don't have data on this by then. But while it isn't default and a user has to opt in to napi-tx to try it that seems enough guardrail to me. The original reason was lack of data on whether napi-tx may suffer from cache invalidations when tx and rx softirq are on different cpus and we enable tx descriptor cleaning from the rx handler (i.e., on ACK).quoted
From those initial numbers it seemed to be a win even with thoseinvalidations. https://patchwork.ozlabs.org/patch/746232/ In lieu of removing the affinity_hint_set, this boolean is the least amount of both new interface and implementation to allow experimentation. We can easily leave it as a noop eventually when we are confident that napi-tx can be enabled even without affinity. By comparison, an ethtool param would be quite a bit of new logic.So it boils down to this: if we believe napi tx is always a good idea, then this flag is there just in case, and the patch is fine. If it's good for some workloads but not others, and will be for a while, we are better off with an ethtool flag. What's the case here?
Based on benchmark results so far, it looks like the first. It's markedly better for some and in the noise for most. The crux is the "for a while". We undoubtedly will find some cases that we need to fix before flipping the default. The choice is between using napi_tx vs adding a separate ethtool function to do essentially the same. This patch is the simpler option, and easier to backport to guest kernels.
OTOH if you want to add more trick to the affinity hint, such as the mentioned above # of queues matching core count, that is also fine IMHO.
From the above linked patch, I understand that there are yet
other special cases in production, such as a hard cap on #tx queues to 32 regardless of number of vcpus. I don't think we want to add special cases for all this kind of business logic in the kernel.