Re: [PATCH 0/37] PCI/MSI: Enforce explicit IRQ vector management by removing devres auto-free
From: Shawn Lin <shawn.lin@rock-chips.com>
Date: 2026-02-24 03:05:20
Also in:
dmaengine, dri-devel, linux-arm-msm, linux-crypto, linux-cxl, linux-gpio, linux-i2c, linux-i3c, linux-input, linux-iommu, linux-media, linux-mmc, linux-pci, linux-riscv, linux-serial, linux-spi, linux-usb, platform-driver-x86
在 2026/02/24 星期二 1:38, Andy Shevchenko 写道:
On Tue, Feb 24, 2026 at 12:09:37AM +0800, Shawn Lin wrote:quoted
在 2026/02/23 星期一 23:50, Andy Shevchenko 写道:quoted
On Mon, Feb 23, 2026 at 5:32 PM Shawn Lin [off-list ref] wrote:quoted
This patch series addresses a long-standing design issue in the PCI/MSI subsystem where the implicit, automatic management of IRQ vectors by the devres framework conflicts with explicit driver cleanup, creating ambiguity and potential resource management bugs. ==== The Problem: Implicit vs. Explicit Management ==== Historically, `pcim_enable_device()` not only manages standard PCI resources (BARs) via devres but also implicitly triggers automatic IRQ vector management by setting a flag that registers `pcim_msi_release()` as a cleanup action. This creates an ambiguous ownership model. Many drivers follow a pattern of: 1. Calling `pci_alloc_irq_vectors()` to allocate interrupts. 2. Also calling `pci_free_irq_vectors()` in their error paths or remove routines. When such a driver also uses `pcim_enable_device()`, the devres framework may attempt to free the IRQ vectors a second time upon device release, leading to a double-free. Analysis of the tree shows this hazardous pattern exists widely, while 35 other drivers correctly rely solely on the implicit cleanup.Is this confirmed? What I read from the cover letter, this series was only compile-tested, so how can you prove the problem exists in the first place?Yes, it's confirmed. My debug of a double free issue of a out-of-tree PCIe wifi driver which uses pcim_enable_device + pci_alloc_irq_vectors + pci_free_irq_vectors expose it. And we did have a TODO to cleanup this hybrid usage, targeted in this cycle[1] suggested by Philipp:Okay, fair enough. I think this bit was missing in the cover letter.quoted
quoted
quoted
==== The Solution: Making Management Explicit ==== This series enforces a clear, predictable model: 1. New Managed API (Patch 1/37): Introduces pcim_alloc_irq_vectors() and pcim_alloc_irq_vectors_affinity(). Drivers that desire devres-managed IRQ vectors should use these functions, which set the is_msi_managed flag and ensure automatic cleanup. 2. Patches 2 through 36 convert each driver that uses pcim_enable_device() alongside pci_alloc_irq_vectors() and relies on devres for IRQ vector cleanup to instead make an explicit call to pcim_alloc_irq_vectors(). 3. Core Change (Patch 37/37): With the former cleanup, now modifies pcim_setup_msi_release() to check only the is_msi_managed flag. This decouples automatic IRQ cleanup from pcim_enable_device(). IRQ vectors allocated via pci_alloc_irq_vectors*() are now solely the driver's responsibility to free with pci_free_irq_vectors(). With these changes, we clear ownership model: Explicit resource management eliminates ambiguity and follows the "principle of least surprise." New drivers choose one model and be consistent. - Use `pci_alloc_irq_vectors()` + `pci_free_irq_vectors()` for explicit control. - Use `pcim_alloc_irq_vectors()` for devres-managed, automatic cleanup.Have you checked previous attempts? Why is your series better than those?
Thanks for sharing this 5-years-old discusstion, I totally missed it. I read the V7 discussion, and it seems to have disappeared without much follow-up, like a stone dropped into the ocean. For five years, newly added drivers have continued to misuse these APIs incorrectly, and we’ve been watching it happen. I can’t really claim this patch series is inherently better than Dejin’s earlier work at its core, this is just about fixing one entire category of misuse in a single pass. According to Bjorn's final search and reply, if we include the removal of deprecated APIs, it would require a massive amount of work and might span many release cycles. Unfortunately, the work never began, and the cleanup might never be completed. I’m not sure if folks have changed their minds now. Can we at least start by completing the changes for the pci_alloc_irq_vectors category?
quoted
There seems not previous attempts.Maybe we are looking to the different projects... https://lore.kernel.org/all/?q=pcim_alloc_irq_vectorsquoted
quoted
quoted
==== Testing And Review ==== 1. This series is only compiled test with allmodconfig. 2. Given the substantial size of this patch series, I have structured the mailing to facilitate efficient review. The cover letter, the first patch and the last one will be sent to all relevant mailing lists and key maintainers to ensure broad visibility and initial feedback on the overall approach. The remaining subsystem-specific patches will be sent only to the respective subsystem maintainers and their associated mailing lists, reducing noise.