Thread (72 messages) 72 messages, 10 authors, 2021-06-10

RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

From: Parav Pandit <hidden>
Date: 2019-03-05 19:46:37
Also in: lkml

-----Original Message-----
From: Jakub Kicinski <redacted>
Sent: Monday, March 4, 2019 7:35 PM
To: Parav Pandit <redacted>
Cc: Or Gerlitz <redacted>; netdev@vger.kernel.org; linux-
kernel@vger.kernel.org; michal.lkml@markovi.net; davem@davemloft.net;
gregkh@linuxfoundation.org; Jiri Pirko [off-list ref]
Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

Parav, please wrap your responses to at most 80 characters.
This is hard to read.
Sorry about it. I will wrap now on.
On Mon, 4 Mar 2019 04:41:01 +0000, Parav Pandit wrote:
quoted
quoted
-----Original Message-----
From: Jakub Kicinski <redacted>
Sent: Friday, March 1, 2019 2:04 PM
To: Parav Pandit <redacted>; Or Gerlitz
[off-list ref]
Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
michal.lkml@markovi.net; davem@davemloft.net;
gregkh@linuxfoundation.org; Jiri Pirko [off-list ref]
Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
extension

On Thu, 28 Feb 2019 23:37:44 -0600, Parav Pandit wrote:
quoted
Requirements for above use cases:
--------------------------------
1. We need a generic user interface & core APIs to create sub
devices from a parent pci device but should be generic enough for
other parent devices 2. Interface should be vendor agnostic 3.
User should be able to set device params at creation time 4. In
future if needed, tool should be able to create passthrough device
to map to a virtual machine
Like a mediated device?
Yes.
quoted
https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
https://www.dpdk.org/wp-content/uploads/sites/35/2018/06/Mediated-
Devices-Better-Userland-IO.pdf

Other than pass-through it is entirely unclear to me why you'd need a
bus.
quoted
quoted
(Or should I say VM pass through or DPDK?)  Could you clarify why
the need for a bus?
A bus follow standard linux kernel device driver model to attach a
driver to specific device. Platform device with my limited
understanding looks a hack/abuse of it based on documentation [1], but
it can possibly be an alternative to bus if it looks fine to Greg and
others.
I grok from this text that the main advantage you see is the ability to choose
a driver for the subdevice.
Yes.
quoted
quoted
My thinking is that we should allow spawning subports in devlink and
if user specifies "passthrough" the device spawned would be an mdev.
devlink device is much more comprehensive way to create sub-devices
than sub-ports for at least below reasons.

1. devlink device already defines device->port relation which enables
to create multiport device.
I presume that by devlink device you mean devlink instance?  Yes, this part
I'm following.
Yes -> 'struct devlink' 
quoted
subport breaks that.
Breaks what?  The ability to create a devlink instance with multiple ports?
Right.
quoted
2. With bus model, it enables us to load driver of same vendor or
generic one such a vfio in future.
Yes, sorry, I'm not an expert on mdevs, but isn't that the goal of those?
Could you go into more detail why not just use mdevs?
I am novice at mdev level too. mdev or vfio mdev.
Currently by default we bind to same vendor driver, but when it was created as passthrough device, vendor driver won't create netdevice or rdma device for it.
And vfio/mdev or whatever mature available driver would bind at that point.
quoted
3. Devices live on the bus, mapping a subport to 'struct device' is
not intuitive.
Are you saying that the main devlink instance would not have any port
information for the subdevices?
Right, this newly created devlink device is the control point of its port(s).
Devices live on a bus.  Software constructs - depend on how one wants to
model them - don't have to.
quoted
4. sub-device allows to use existing devlink port, registers, health
infrastructure to sub devices, which otherwise need to be duplicated
for ports.
Health stuff is not tied to a port, I'm not following you.  You can create a
reporter per port, per ACL rule or per SB or per whatever your heart desires..
Instead of creating multiple reporters and inventing these reporter naming schemes,
creating devlink instance leverage all health reporting done for a devliink instance.
So whatever is done for instance A (parent), can be available for instance B (subdev).
quoted
5. Even though current devlink devices are networking devices, there
is nothing restricts it to be that way. So subport is a restricted
view.
6. devlink device already covers
port sub-object, hence creating devlink device is desired.
quoted
quoted
5. A device can have multiple ports
What does this mean, in practice?  You want to spawn a subdev which
can access both ports?  That'd be for RDMA use cases, more than
Ethernet, right?  (Just clarifying :))
Yep, you got it right. :-)
quoted
quoted
So how is it done?
------------------
(a) user in control
To address above requirements, a generic tool iproute2/devlink is
extended for sub device's life cycle.
However a devlink tool and its kernel counter part is not
sufficient to create protocol agnostic devices on a existing PCI
bus.
"Protocol agnostic"?...  What does that mean?
Devlink works on bus,device model. It doesn't matter what class of
device is. For example, for pci class can be anything. So newly
created sub-devices are not limited to netdev/rdma devices. Its
agnostic to protocol. More importantly, we don't want to create these
sub-devices who bus type is 'pci'. Because as described below, PCI has
its addressing scheme and pci bus must not have mix-n match devices.

So probably better wording should be,
'a devlink tool and its kernel counterpart is not sufficient to create
sub-devices of same class as that of PCI device.
Let me clarify - for networking devices the partition will most likely end up as
a subport, but its not a requirement that each partition must be a subport..
The question was about the necessity to invent a new bus, and have every
resource have a struct device..
A device object and bus connecting all software objects correctly. This includes,
1. devlink bus/name handle based access
2. matching such device in sysfs
3. parent child hierarchy in sysfs
4. ability to bind different driver
5. multi-ports per device
6. still usable for single port use case
7. parameters setting at devlink instance level
8. parent-child relation handling power mgmt
9. follows standard linux driver model

Some are achievable to through mfd too, instead of subdev bus.
Will follow Greg's guidance on this.
quoted
quoted
quoted
(b) subdev bus
A given bus defines well defined addressing scheme. Creating sub
devices on existing PCI bus with a different naming scheme is just
weird. So, creating well named devices on appropriate bus is
desired.
What's that address scheme you're referring to, you seem to assign
IDs in sequence?
Yes. a device on subdev bus follows standard linux driver model based
id assignment scheme = u32. And devices are well named as 'subdev0'.
Prefix + id as the default scheme of core driver model.
I thought "well defined addressing scheme" means I can address subdevice X
of device Y with your scheme.  I can't, it's just an global ID.  Thanks for
clarifying.
It's a global ID on the subdev bus.
subdevice X are listed under parent device Y.

We did consider embedding parent PCI address in child was considered, but its duplicate info that doesn't seem worth.

devlink will show its parent device link, like
$devlink dev show
pci/0000:05:00.0
subdev/subdev0 parent pci/0000:05:00.0
quoted
quoted
The things key thing for me on the netdev side is what is the
forwarding model to this new entity.  Is this basically VMDQ?
Should we just go ahead and mandate "switchdev mode" here?
It will follow the switchdev mode, but it not limited to it.
Switchdev mode is for the eswitch functionality. There isn't a need to
combine this. rdma Infiniband will be able to use this without
switchdev mode.
It's the devlink instance that's in "switchdev mode", regardless of type of any
of its ports.
I didn't follow your comment.
What I wanted to say, is, 
When $devlink dev add pci/0000:05:00.0 is done,
devlink instance pci/0000:05:00.0, doesn't have to be in switchdev mode.
We do not plan to support switchdev, but it is not devlink's domain to enforce it.

switchdev mode has nothing to do with sriov, even though it might have started with that vision.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help