Re: [PATCH net-next V4 4/6] devlink: Apply eswitch mode boot defaults
From: Mark Bloch <mbloch@nvidia.com>
Date: 2026-07-03 18:27:35
Also in:
linux-doc, linux-rdma
On 02/07/2026 10:52, Jiri Pirko wrote:
Wed, Jul 01, 2026 at 07:42:57PM +0200, mbloch@nvidia.com wrote:quoted
On 01/07/2026 17:09, Jiri Pirko wrote:quoted
Wed, Jul 01, 2026 at 02:57:21PM +0200, mbloch@nvidia.com wrote:quoted
On 01/07/2026 12:48, Jiri Pirko wrote:quoted
Mon, Jun 29, 2026 at 08:20:59PM +0200, mbloch@nvidia.com wrote:quoted
Apply parsed devlink_eswitch_mode= defaults after devlink registration and after successful reload. devl_register() may still be called before the device is ready for anHow so? I would assume that driver calls devl_register only after everything is up and running and ready. If not, isn't it a bug?You would think so :) Some drivers, mlx5 included, call devl_register() while holding the devlink instance lock and then finish setting up state before releasing the lock. In v3 I tried to enforce exactly that model, move devl_register() to be the last thing the driver does. Jakub pushed back on making that a general rule. So in v4 I changed the approach. devl_register() only schedules the work, and the actual eswitch mode change can run only after the driver releases the devlink lock.Wouldn't it make sense to use a completion instead of loop-reschedule of delayed work?Just to make sure I understand the suggestion, this would mean that the work waits until the devlink lock holder drops the lock, and devl_unlock() would signal it, something like: void devl_unlock(struct devlink *devlink) { ool complete_apply = devlink->default_esw_mode_apply_pending; mutex_unlock(&devlink->lock); if (complete_apply) complete(&devlink->default_esw_mode_apply_ready); } That would avoid the retry loop, but it also means the queued work sleeps until the driver drops devl_lock. It does keep one worker blocked per pending instance and adds this default-esw-mode signalling to the generic devl_unlock() path. The delayed retry was meant to avoid a sleeping worker and keep the instances independent. If one devlink instance is still locked, we just try it again later while other instances can progress. If you prefer the completion approach I can switch to it, but I don't see it as simpler overall.Yeah, I don't have preference. I was just wondering. Feel free to leave it as is. Maybe, instead of "complete", you can schedule with "0" delay in devl_unlock? Well, it does not really need to be delayed work, right? The only single schedule may be done from devl_unlock. That would help to eliminate the rescheduling. Am I missing something?
Yeah, that can work. The only part I don't really like is adding default-esw-mode specific logic to devl_unlock(). But if you are fine with that, I can switch to this approach. There is still a small race between mutex_unlock() and queue_work(), where someone else can take devl_lock() first. So the worker may still wait on the lock, but the window should be small and we get rid of the delayed retry loop. Mark
quoted
Markquoted
quoted
Markquoted
quoted
eswitch mode change, so keep a per-devlink delayed work item and pending flag for the registration path. Registration queues the work, and the worker tries to take the devlink instance lock. If the lock is busy, the worker requeues itself with a delay. For successful reloads that performed DRIVER_REINIT, devlink_reload() already holds the devlink instance lock and the driver has completed reload_up(). Clear pending work and apply the default directly from the reload path instead of queueing work. If a user sets eswitch mode through netlink before the pending registration work runs, clear the pending flag so the queued default does not override that user request. Cancel pending default apply work when freeing the devlink instance.These AI generated code descriptive messages are generally not very useful :(