Thread (126 messages) 126 messages, 19 authors, 2024-08-16

Re: [MAINTAINERS SUMMIT] Device Passthrough Considered Harmful?

From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2024-08-06 13:04:17
Also in: linux-cxl, linux-rdma

On Tue, Aug 06, 2024 at 09:14:20AM +0200, Daniel Vetter wrote:
On Thu, Aug 01, 2024 at 11:22:23AM -0300, Jason Gunthorpe wrote:
quoted
On Tue, Jul 30, 2024 at 09:13:00AM +0200, Daniel Vetter wrote:
quoted
I think a solid consensus on the topics above would be really useful for
gpu/accel too. We're still busy with more pressing community/ecosystem
building needs, but gpu fw has become rather complex and it's not
stopping. And there's random other devices attached too nowadays, so fwctl
makes a ton of sense.
Yeah, I'm pretty sure GPU is going to need fwctl too, the GPU's are
going to have the same issues as NIC does. I see people are already
struggling with topics like how to get debug traces out of the GPU FW.
quoted
But for me the more important stuff would be some clear guidelines like
what should be in other more across-devices subsystems like edac (or other
ras features), what should be in functional subsystems like netdev, rdma,
gpu/accel, ... whatever else, and what should be exposed through some
special purpose subsystems like hwmon.
In my mind the most important part is that fwctl is not exclusive, the
FW interface and things being manipulated must be sharable or blocked
from fwctl. We should never get in a situation where a fwctl
implementation becomes a reason we cannot have a functional subsystem
interface.
Hm still not clear to me how you want to achive that, but I guess best
I'll jump over to the fwctl thread and ask about those details
there.
I'm looking at it from the perspective of mlx5 which has deep
multi-user support in the FW. There is almost nothing in the interface
that is "global" and would become a problem. Everything else can, and
often already is, reasonably be shared.

I think that would have to be the baseline for what you could expose
here.

Like with the memory scrubbing example. It would be fine if fwctl can
read any related counters concurrently with the EDAC driver reading
the same counters. But fwctl shouldn't clear counters or program a
single global scrubber unit.

This limitation has to be baked into the FW/driver on the fwctl side
to undertsand and block these things.

Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help