Re: [PATCH v4 1/6] pds_core: add support for quiet devcmd failures
From: Simon Horman <horms@kernel.org>
Date: 2026-06-15 15:35:02
This is an AI-generated review of your patch. The human sending this email has considered the AI review valid, or at least plausible. Full review at: https://sashiko.dev --- pds_core: add support for quiet devcmd failures This commit introduces support for suppressing error messages during device commands by adding a boolean parameter to wait and lock functions. It changes pdsc_devcmd_wait and pdsc_devcmd_locked into internal __pdsc_devcmd_wait and __pdsc_devcmd_locked variants that handle the new messaging flag. This functionality is intended for negotiating the PDS_CORE_IDENTITY_VERSION_2.
quoted hunk ↗ jump to hunk
diff --git a/drivers/net/ethernet/amd/pds_core/dev.c b/drivers/net/ethernet/amd/pds_core/dev.c index bded6b33289ce..dd9989cfe6b3f 100644 --- a/drivers/net/ethernet/amd/pds_core/dev.c +++ b/drivers/net/ethernet/amd/pds_core/dev.c
[ ... ]
quoted hunk ↗ jump to hunk
-int pdsc_devcmd_locked(struct pdsc *pdsc, union pds_core_dev_cmd *cmd, - union pds_core_dev_comp *comp, int max_seconds) +static int __pdsc_devcmd_locked(struct pdsc *pdsc, union pds_core_dev_cmd *cmd, + union pds_core_dev_comp *comp, int max_seconds, + const bool do_msg) { int err;@@ -197,7 +199,7 @@ int pdsc_devcmd_locked(struct pdsc *pdsc, union pds_core_dev_cmd *cmd, memcpy_toio(&pdsc->cmd_regs->cmd, cmd, sizeof(*cmd)); pdsc_devcmd_dbell(pdsc); - err = pdsc_devcmd_wait(pdsc, cmd->opcode, max_seconds); + err = __pdsc_devcmd_wait(pdsc, cmd->opcode, max_seconds, do_msg); if ((err == -ENXIO || err == -ETIMEDOUT) && pdsc->wq) queue_work(pdsc->wq, &pdsc->health_work);
[Severity: Critical]
This isn't a bug introduced by this patch, but it occurs in the function
modified here.
Can this queue_work() call trigger a use-after-free during driver removal?
Looking at pdsc_remove() in drivers/net/ethernet/amd/pds_core/main.c,
destroy_workqueue() is called but the pdsc->wq pointer is not cleared:
pdsc_remove() {
...
if (pdsc->wq)
destroy_workqueue(pdsc->wq);
mutex_lock(&pdsc->config_lock);
set_bit(PDSC_S_STOPPING_DRIVER, &pdsc->state);
pdsc_stop(pdsc);
pdsc_teardown(pdsc, PDSC_TEARDOWN_REMOVING);
...
}
Subsequently, pdsc_teardown() issues a RESET command to the firmware.
If the firmware is unresponsive and the command times out, we reach this
error path in __pdsc_devcmd_locked(). Since the pdsc->wq pointer was never
set to NULL, the check here succeeds, and it attempts to queue work to a
destroyed workqueue.
Should pdsc->wq be explicitly set to NULL immediately after being destroyed
in pdsc_remove() to prevent this?
else memcpy_fromio(comp, &pdsc->cmd_regs->comp, sizeof(*comp));