Thread (31 messages) 31 messages, 6 authors, 2024-06-11

RE: [PATCH net-next] virtio_net: Add TX stop and wake counters

From: Dan Jurgens <hidden>
Date: 2024-02-20 18:02:50
Also in: virtualization

From: Daniel Jurgens <redacted>
Sent: Wednesday, February 7, 2024 2:59 PM
To: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <redacted>; Jakub Kicinski
[off-list ref]; Jason Xing [off-list ref];
netdev@vger.kernel.org; xuanzhuo@linux.alibaba.com;
virtualization@lists.linux.dev; davem@davemloft.net;
edumazet@google.com; abeni@redhat.com; Parav Pandit
[off-list ref]
Subject: RE: [PATCH net-next] virtio_net: Add TX stop and wake counters

quoted
From: Michael S. Tsirkin <mst@redhat.com>
Sent: Wednesday, February 7, 2024 2:19 PM
To: Daniel Jurgens <redacted>
Subject: Re: [PATCH net-next] virtio_net: Add TX stop and wake
counters

On Wed, Feb 07, 2024 at 07:38:16PM +0000, Daniel Jurgens wrote:
quoted
quoted
From: Michael S. Tsirkin <mst@redhat.com>
Sent: Sunday, February 4, 2024 6:40 AM
To: Jason Wang <redacted>
Cc: Jakub Kicinski <kuba@kernel.org>; Jason Xing
[off-list ref]; Daniel Jurgens [off-list ref];
netdev@vger.kernel.org; xuanzhuo@linux.alibaba.com;
virtualization@lists.linux.dev; davem@davemloft.net;
edumazet@google.com; abeni@redhat.com; Parav Pandit
[off-list ref]
Subject: Re: [PATCH net-next] virtio_net: Add TX stop and wake
counters

On Sun, Feb 04, 2024 at 09:20:18AM +0800, Jason Wang wrote:
quoted
On Sat, Feb 3, 2024 at 12:01 AM Jakub Kicinski [off-list ref]
wrote:
quoted
quoted
quoted
quoted
On Fri, 2 Feb 2024 14:52:59 +0800 Jason Xing wrote:
quoted
quoted
Can you say more? I'm curious what's your use case.
I'm not working at Nvidia, so my point of view may differ
from
theirs.
quoted
quoted
quoted
quoted
quoted
From what I can tell is that those two counters help me
narrow down the range if I have to diagnose/debug some issues.
right, i'm asking to collect useful debugging tricks, nothing
against the patch itself :)
quoted
1) I sometimes notice that if some irq is held too long
(say, one simple case: output of printk printed to the
console), those two counters can reflect the issue.
2) Similarly in virtio net, recently I traced such counters
the current kernel does not have and it turned out that one
of the output queues in the backend behaves badly.
...

Stop/wake queue counters may not show directly the root
cause of the issue, but help us 'guess' to some extent.
I'm surprised you say you can detect stall-related issues with this.
I guess virtio doesn't have BQL support, which makes it special.
Yes, virtio-net has a legacy orphan mode, this is something that
needs to be dropped in the future. This would make BQL much more
easier to be implemented.

It's not that we can't implement BQL, it's that it does not seem
to be benefitial - has been discussed many times.
quoted
quoted
Normal HW drivers with BQL almost never stop the queue by
themselves.
quoted
quoted
quoted
quoted
I mean - if they do, and BQL is active, then the system is
probably misconfigured (queue is too short). This is what we
use at Meta to detect stalls in drivers with BQL:

https://lore.kernel.org/all/20240131102150.728960-3-leitao@deb
ia
n.or
g/

Daniel, I think this may be a good enough excuse to add
per-queue stats to the netdev genl family, if you're up for
that. LMK if you want more info, otherwise I guess ethtool -S
is fine
for now.
quoted
quoted
quoted
quoted
Thanks
Michael,
	Are you OK with this patch? Unless I missed it I didn't see a
response
from you in our conversation the day I sent it.
quoted
I thought what is proposed is adding some support for these stats to core?
Did I misunderstood?
That's a much bigger change and going that route I think still need to count
them at the driver level. I said I could potentially take that on as a background
project. But would prefer to go with ethtool -S for now.
Michael, are you a NACK on this? Jakub seemed OK with it, Jason also thinks it's useful, and it's low risk. 
quoted
--
MST
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help