Re: [PATCH net-next] virtio_net: Add TX stop and wake counters
From: Jason Wang <hidden>
Date: 2024-02-04 01:20:33
Also in:
virtualization
On Sat, Feb 3, 2024 at 12:01 AM Jakub Kicinski [off-list ref] wrote:
On Fri, 2 Feb 2024 14:52:59 +0800 Jason Xing wrote:quoted
quoted
Can you say more? I'm curious what's your use case.I'm not working at Nvidia, so my point of view may differ from theirs. From what I can tell is that those two counters help me narrow down the range if I have to diagnose/debug some issues.right, i'm asking to collect useful debugging tricks, nothing against the patch itself :)quoted
1) I sometimes notice that if some irq is held too long (say, one simple case: output of printk printed to the console), those two counters can reflect the issue. 2) Similarly in virtio net, recently I traced such counters the current kernel does not have and it turned out that one of the output queues in the backend behaves badly. ... Stop/wake queue counters may not show directly the root cause of the issue, but help us 'guess' to some extent.I'm surprised you say you can detect stall-related issues with this. I guess virtio doesn't have BQL support, which makes it special.
Yes, virtio-net has a legacy orphan mode, this is something that needs to be dropped in the future. This would make BQL much more easier to be implemented.
Normal HW drivers with BQL almost never stop the queue by themselves. I mean - if they do, and BQL is active, then the system is probably misconfigured (queue is too short). This is what we use at Meta to detect stalls in drivers with BQL: https://lore.kernel.org/all/20240131102150.728960-3-leitao@debian.org/ (local) Daniel, I think this may be a good enough excuse to add per-queue stats to the netdev genl family, if you're up for that. LMK if you want more info, otherwise I guess ethtool -S is fine for now.
Thanks