Re: [MAINTAINERS SUMMIT] Device Passthrough Considered Harmful?
From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2024-08-01 14:22:28
Also in:
linux-cxl, linux-rdma
On Tue, Jul 30, 2024 at 09:13:00AM +0200, Daniel Vetter wrote:
I think a solid consensus on the topics above would be really useful for gpu/accel too. We're still busy with more pressing community/ecosystem building needs, but gpu fw has become rather complex and it's not stopping. And there's random other devices attached too nowadays, so fwctl makes a ton of sense.
Yeah, I'm pretty sure GPU is going to need fwctl too, the GPU's are going to have the same issues as NIC does. I see people are already struggling with topics like how to get debug traces out of the GPU FW.
But for me the more important stuff would be some clear guidelines like what should be in other more across-devices subsystems like edac (or other ras features), what should be in functional subsystems like netdev, rdma, gpu/accel, ... whatever else, and what should be exposed through some special purpose subsystems like hwmon.
In my mind the most important part is that fwctl is not exclusive, the FW interface and things being manipulated must be sharable or blocked from fwctl. We should never get in a situation where a fwctl implementation becomes a reason we cannot have a functional subsystem interface.
We've got plenty of experience in enforcing such a community contract with vendors, but the hard part is creating a clear and ideally concise documentation page I can just point vendors at as the ground truth.
Well, I tried with the documentation in the fwctl patch series.. https://lore.kernel.org/linux-rdma/6-v2-940e479ceba9+3821-fwctl_jgg@nvidia.com/ (local) Jason