Thread (40 messages) 40 messages, 7 authors, 2024-02-20

Re: [RFC PATCH v3 net-next] Documentation: devlink: Add devlink-sd

From: Jakub Kicinski <kuba@kernel.org>
Date: 2024-02-09 01:26:34

On Fri, 2 Feb 2024 08:46:56 +0100 Jiri Pirko wrote:
Fri, Feb 02, 2024 at 05:00:41AM CET, kuba@kernel.org wrote:
quoted
On Thu, 1 Feb 2024 11:13:57 +0100 Jiri Pirko wrote:  
quoted
Wait a sec.  
No, you wait a sec ;) Why do you think this belongs to devlink?
Two months ago you were complaining bitterly when people were
considering using devlink rate to control per-queue shapers.
And now it's fine to add queues as a concept to devlink?  
Do you have a better suggestion how to model common pool object for
multiple netdevices? This is the reason why devlink was introduced to
provide a platform for common/shared things for a device that contains
multiple netdevs/ports/whatever. But I may be missing something here,
for sure.
devlink just seems like the lowest common denominator, but the moment
we start talking about multi-PF devices it also gets wobbly :(
I think it's better to focus on the object, without scoping it to some
ancestor which may not be sufficient tomorrow (meaning its own family
or a new object in netdev like page pool).
quoted
quoted
With this API, user can configure sharing of the descriptors.
So there would be a pool (or multiple pools) of descriptors and the
descriptors could be used by many queues/representors.

So in the example above, for 1k representors you have only 1k
descriptors.

The infra allows great flexibility in terms of configuring multiple
pools of different sizes and assigning queues from representors to
different pools. So you can have multiple "classes" of representors.
For example the ones you expect heavy trafic could have a separate pool,
the rest can share another pool together, etc.  
Well, it does not extend naturally to the design described in that blog
post. There I only care about a netdev level pool, but every queue can
bind multiple pools.

It also does not cater naturally to a very interesting application
of such tech to lightweight container interfaces, macvlan-offload style.
As I said at the beginning, why is the pool a devlink thing if the only
objects that connect to it are netdevs?  
Okay. Let's model it differently, no problem. I find devlink device
as a good fit for object to contain shared things like pools.
But perhaps there could be something else. Something new?
We need something new for more advanced memory providers, anyway.
The huge page example I posted a year ago needs something to get
a huge page from CMA and slice it up for the page pools to draw from.
That's very similar, also not really bound to a netdev. I don't think
the cross-netdev aspect is the most important aspect of this problem.
quoted
Another netdev thing where this will be awkward is page pool
integration. It lives in netdev genl, are we going to add devlink pool
reference to indicate which pool a pp is feeding?  
Page pool is per-netdev, isn't it? It could be extended to be bound per
devlink-pool as you suggest. It is a bit awkward, I agree.

So instead of devlink, should be add the descriptor-pool object into
netdev genl and make possible for multiple netdevs to use it there?
I would still miss the namespace of the pool, as it naturally aligns
with devlink device. IDK :/
Maybe the first thing to iron out is the life cycle. Right now we
throw all configuration requests at the driver which ends really badly
for those of us who deal with heterogeneous environments. Applications
which try to do advanced stuff like pinning and XDP break because of
all the behavior differences between drivers. So I don't think we
should expose configuration of unstable objects (those which user
doesn't create explicitly - queues, irqs, page pools etc) to the driver.
The driver should get or read the config from the core when the object
is created.

This gets back to the proposed descriptor pool because there's a
chicken and an egg problem between creating the representors and
creating the descriptor pool, right? Either:
 - create reprs first with individual queues, reconfigure them to bind
   them to a pool
 - create pool first bind the reprs which don't exist to them,
   assuming the driver somehow maintains the mapping, pretty weird
   to configure objects which don't exist
 - create pool first, add an extra knob elsewhere (*cough* "shared-descs
   enable") which produces somewhat loosely defined reasonable behavior

Because this is a general problem (again, any queue config needs it)
I think we'll need to create some sort of a rule engine in netdev :(
Instead of configuring a page pool you'd add a configuration rule
which can match on netdev and queue id and gives any related page pool
some parameters. NAPI is another example of something user can't
reasonably configure directly. And if we create such a rule engine 
it should probably be shared...
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help