Thread (42 messages) 42 messages, 5 authors, 2024-02-26

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

From: Jason Wang <jasowang@redhat.com>
Date: 2023-08-11 09:19:57
Also in: lkml, virtualization

On Fri, Aug 11, 2023 at 1:42 PM Michael S. Tsirkin [off-list ref] wrote:
On Fri, Aug 11, 2023 at 10:23:15AM +0800, Jason Wang wrote:
quoted
On Fri, Aug 11, 2023 at 3:41 AM Michael S. Tsirkin [off-list ref] wrote:
quoted
On Tue, Aug 08, 2023 at 10:30:56AM +0800, Jason Wang wrote:
quoted
On Mon, Jul 31, 2023 at 2:30 PM Jason Wang [off-list ref] wrote:
quoted
On Thu, Jul 27, 2023 at 5:46 PM Michael S. Tsirkin [off-list ref] wrote:
quoted
On Thu, Jul 27, 2023 at 04:59:33PM +0800, Jason Wang wrote:
quoted
quoted
They really shouldn't - any NIC that takes forever to
program will create issues in the networking stack.
Unfortunately, it's not rare as the device/cvq could be implemented
via firmware or software.
Currently that mean one either has sane firmware with a scheduler that
can meet deadlines, or loses ability to report errors back.
quoted
quoted
But if they do they can always set this flag too.
This may have false negatives and may confuse the management.

Maybe we can extend the networking core to allow some device specific
configurations to be done with device specific lock without rtnl. For
example, split the set_channels to

pre_set_channels
set_channels
post_set_channels

The device specific part could be done in pre and post without a rtnl lock?

Thanks

Would the benefit be that errors can be reported to userspace then?
Then maybe.  I think you will have to show how this works for at least
one card besides virtio.
Even for virtio, this seems not easy, as e.g the
virtnet_send_command() and netif_set_real_num_tx_queues() need to
appear to be atomic to the networking core.

I wonder if we can re-consider the way of a timeout here and choose a
sane value as a start.
Michael, any more input on this?

Thanks
I think this is just mission creep. We are trying to fix
vduse - let's do that for starters.

Recovering from firmware timeouts is far from trivial and
just assuming that just because it timed out it will not
access memory is just as likely to cause memory corruption
with worse results than an infinite spin.
Yes, this might require support not only in the driver
quoted
I propose we fix this for vduse and assume hardware/firmware
is well behaved.
One major case is the re-connection, in that case it might take
whatever longer that the kernel virito-net driver expects.
So we can have a timeout in VDUSE and trap CVQ then VDUSE can return
and fail early?
Ugh more mission creep. not at all my point. vduse should cache
values in the driver,
What do you mean by values here? The cvq command?

Thanks
until someone manages to change
net core to be more friendly to userspace devices.
quoted
quoted
Or maybe not well behaved firmware will
set the flag losing error reporting ability.
This might be hard since it means not only the set but also the get is
unreliable.

Thanks
/me shrugs


quoted
quoted

quoted
quoted
Thanks
quoted

--
MST
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help