RE: [PATCH net-next] virtio_net: Add TX stop and wake counters
From: Dan Jurgens <hidden>
Date: 2024-02-20 18:02:50
Also in:
virtualization
From: Daniel Jurgens <redacted> Sent: Wednesday, February 7, 2024 2:59 PM To: Michael S. Tsirkin <mst@redhat.com> Cc: Jason Wang <redacted>; Jakub Kicinski [off-list ref]; Jason Xing [off-list ref]; netdev@vger.kernel.org; xuanzhuo@linux.alibaba.com; virtualization@lists.linux.dev; davem@davemloft.net; edumazet@google.com; abeni@redhat.com; Parav Pandit [off-list ref] Subject: RE: [PATCH net-next] virtio_net: Add TX stop and wake countersquoted
From: Michael S. Tsirkin <mst@redhat.com> Sent: Wednesday, February 7, 2024 2:19 PM To: Daniel Jurgens <redacted> Subject: Re: [PATCH net-next] virtio_net: Add TX stop and wake counters On Wed, Feb 07, 2024 at 07:38:16PM +0000, Daniel Jurgens wrote:quoted
quoted
From: Michael S. Tsirkin <mst@redhat.com> Sent: Sunday, February 4, 2024 6:40 AM To: Jason Wang <redacted> Cc: Jakub Kicinski <kuba@kernel.org>; Jason Xing [off-list ref]; Daniel Jurgens [off-list ref]; netdev@vger.kernel.org; xuanzhuo@linux.alibaba.com; virtualization@lists.linux.dev; davem@davemloft.net; edumazet@google.com; abeni@redhat.com; Parav Pandit [off-list ref] Subject: Re: [PATCH net-next] virtio_net: Add TX stop and wake counters On Sun, Feb 04, 2024 at 09:20:18AM +0800, Jason Wang wrote:quoted
On Sat, Feb 3, 2024 at 12:01 AM Jakub Kicinski [off-list ref]wrote:quoted
quoted
quoted
quoted
On Fri, 2 Feb 2024 14:52:59 +0800 Jason Xing wrote:quoted
quoted
Can you say more? I'm curious what's your use case.I'm not working at Nvidia, so my point of view may differ fromtheirs.quoted
quoted
quoted
quoted
quoted
From what I can tell is that those two counters help me narrow down the range if I have to diagnose/debug some issues.right, i'm asking to collect useful debugging tricks, nothing against the patch itself :)quoted
1) I sometimes notice that if some irq is held too long (say, one simple case: output of printk printed to the console), those two counters can reflect the issue. 2) Similarly in virtio net, recently I traced such counters the current kernel does not have and it turned out that one of the output queues in the backend behaves badly. ... Stop/wake queue counters may not show directly the root cause of the issue, but help us 'guess' to some extent.I'm surprised you say you can detect stall-related issues with this. I guess virtio doesn't have BQL support, which makes it special.Yes, virtio-net has a legacy orphan mode, this is something that needs to be dropped in the future. This would make BQL much more easier to be implemented.It's not that we can't implement BQL, it's that it does not seem to be benefitial - has been discussed many times.quoted
quoted
Normal HW drivers with BQL almost never stop the queue bythemselves.quoted
quoted
quoted
quoted
I mean - if they do, and BQL is active, then the system is probably misconfigured (queue is too short). This is what we use at Meta to detect stalls in drivers with BQL: https://lore.kernel.org/all/20240131102150.728960-3-leitao@deb ia n.or g/ Daniel, I think this may be a good enough excuse to add per-queue stats to the netdev genl family, if you're up for that. LMK if you want more info, otherwise I guess ethtool -S is finefor now.quoted
quoted
quoted
quoted
ThanksMichael, Are you OK with this patch? Unless I missed it I didn't see a responsefrom you in our conversation the day I sent it.quoted
I thought what is proposed is adding some support for these stats to core? Did I misunderstood?That's a much bigger change and going that route I think still need to count them at the driver level. I said I could potentially take that on as a background project. But would prefer to go with ethtool -S for now.
Michael, are you a NACK on this? Jakub seemed OK with it, Jason also thinks it's useful, and it's low risk.
quoted
-- MST