Re: [PATCH net 1/2] net: mana: Add NULL guards in teardown path to prevent panic on attach failure
From: Dipayaan Roy <hidden>
Date: 2026-05-20 18:11:40
Also in:
bpf, linux-hyperv, linux-rdma, lkml
On Tue, May 19, 2026 at 03:55:56PM -0700, Jakub Kicinski wrote:
This is an AI-generated review of your patch. The human sending this email has considered the AI review valid, or at least plausible. --- net: mana: Add NULL guards in teardown path to prevent panic on attach failure When queue allocation fails partway through, the error cleanup frees and NULLs apc->tx_qp and apc->rxqs. The patch adds NULL guards for these pointers in teardown paths such as mana_destroy_vport(), mana_fence_rqs(), and mana_dealloc_queues() to prevent NULL pointer dereferences.quoted
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c index 9afc786b297a8..9e7e4bf526bf6 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c[ ... ]quoted
@@ -2858,13 +2861,16 @@ static void mana_destroy_vport(struct mana_port_context *apc) struct mana_rxq *rxq; u32 rxq_idx; - for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) { - rxq = apc->rxqs[rxq_idx]; - if (!rxq) - continue; + if (apc->rxqs) { - mana_destroy_rxq(apc, rxq, true); - apc->rxqs[rxq_idx] = NULL; + for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) { + rxq = apc->rxqs[rxq_idx]; + if (!rxq) + continue; + + mana_destroy_rxq(apc, rxq, true); + apc->rxqs[rxq_idx] = NULL; + } }If the teardown path frees apc->rxqs and sets it to NULL on attach failure, how does the interface recover if the user subsequently brings it back up? When mana_open() is invoked, it directly calls mana_alloc_queues(). Inside mana_alloc_queues() -> mana_add_rx_queues(), the code does: apc->rxqs[i] = rxq; If apc->rxqs was left as NULL during the teardown, will this cause a NULL pointer dereference on the next interface bring-up? [ ... ]
Hi Jakub, The only path that recovers from this state is mana_attach(), which calls mana_init_port() -> mana_init_port_context() to re-allocate apc->rxqs before calling mana_alloc_queues(). When mana_open is invoked prior to it the rxqs would be already setup by mana_probe_port -> mana_init_port -> mana_init_port_context.
quoted
@@ -3287,33 +3294,38 @@ static int mana_dealloc_queues(struct net_device *ndev) * number of queues. */ - for (i = 0; i < apc->num_queues; i++) { - txq = &apc->tx_qp[i].txq; - tsleep = 1000; - while (atomic_read(&txq->pending_sends) > 0 && - time_before(jiffies, timeout)) { - usleep_range(tsleep, tsleep + 1000); - tsleep <<= 1; - } - if (atomic_read(&txq->pending_sends)) { - err = pcie_flr(to_pci_dev(gd->gdma_context->dev)); - if (err) { - netdev_err(ndev, "flr failed %d with %d pkts pending in txq %u\n", - err, atomic_read(&txq->pending_sends), - txq->gdma_txq_id); + if (apc->tx_qp) {In the error path where queue creation fails partway through, will mana_destroy_txq() be called? If mana_destroy_txq() unconditionally iterates over all apc->num_queues: for (i = 0; i < apc->num_queues; i++) { ... mana_destroy_wq_obj(apc, GDMA_SQ, apc->tx_qp[i].tx_object); ... Will it pass 0 or INVALID_MANA_HANDLE to mana_destroy_wq_obj() for queues that were never fully initialized? Does mana_destroy_wq_obj() send a MANA_DESTROY_WQ_OBJ command to the hardware with these uninitialized handles, and could handle 0 corrupt the firmware state?
This is a known exisiting issue in the cleaup path for partial tx setups and was also mentioned in the a recent patch where the rx partial init clean-up path was fixed. My colleague Aditya is already working on a patch to fix this and all other issues in the tx cleanup path.
quoted
+ for (i = 0; i < apc->num_queues; i++) { + txq = &apc->tx_qp[i].txq; + tsleep = 1000; + while (atomic_read(&txq->pending_sends) > 0 && + time_before(jiffies, timeout)) { + usleep_range(tsleep, tsleep + 1000); + tsleep <<= 1; + } + if (atomic_read(&txq->pending_sends)) { + err = + pcie_flr(to_pci_dev(gd->gdma_context->dev));If pcie_flr() is triggered here, the PCI Function Level Reset wipes the internal GDMA state. Further down in this function, mana_disable_vport_rx(), mana_fence_rqs(), and mana_destroy_vport() are called. Will these functions attempt to send commands to the now-destroyed GDMA admin queue via mana_send_request() and wait for completions? Could this lead to cascading timeouts since the hardware was just reset and cannot process admin commands?
This is pre-existing behavior/issue the pcie_flr() path and the subsequent calls to mana_disable_vport_rx()/mana_fence_rqs()/mana_destroy_vport() are unchanged by this patch. My patch only adds a NULL guard for apc->tx_qp to skip this entire TX drain block when tx_qp is already freed. But this point seems to be valid issue, I will investigate this further and I will send a different patch set for this as follow up work.
quoted
+ if (err) { + netdev_err(ndev, "flr failed %d with %d pkts pending in txq %u\n", + err, + atomic_read(&txq->pending_sends), + txq->gdma_txq_id); + } + break; } - break; }-- pw-bot: cr
Regards Dipayaan Roy