Thread (8 messages) 8 messages, 4 authors, 2025-10-02

Re: [PATCH RESEND] PCI/AER: Check for NULL aer_info before ratelimiting in pci_print_aer()

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Date: 2025-10-02 18:10:36
Also in: linux-pci, lkml, stable

On 10/2/25 03:06, Christophe Leroy wrote:

Le 29/09/2025 à 17:10, Sathyanarayanan Kuppuswamy a écrit :
quoted
On 9/29/25 2:15 AM, Breno Leitao wrote:
quoted
Similarly to pci_dev_aer_stats_incr(), pci_print_aer() may be called
when dev->aer_info is NULL. Add a NULL check before proceeding to avoid
calling aer_ratelimit() with a NULL aer_info pointer, returning 1, which
does not rate limit, given this is fatal.

This prevents a kernel crash triggered by dereferencing a NULL pointer
in aer_ratelimit(), ensuring safer handling of PCI devices that lack
AER info. This change aligns pci_print_aer() with pci_dev_aer_stats_incr()
which already performs this NULL check.

Cc: stable@vger.kernel.org
Fixes: a57f2bfb4a5863 ("PCI/AER: Ratelimit correctable and non-fatal error logging")
Signed-off-by: Breno Leitao <leitao@debian.org>
---
- This problem is still happening in upstream, and unfortunately no action
   was done in the previous discussion.
- Link to previous post:
   https://eur01.safelinks.protection.outlook.com/? url=https%3A%2F%2Flore.kernel.org%2Fr%2F20250804-aer_crash_2-v1-1- fd06562c18a4%40debian.org&data=05%7C02%7Cchristophe.leroy2%40cs- soprasteria.com%7Cfd3d2f1b4e8448a8e67608ddff6a4e70%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C638947554250805439%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6yTN1%2Fq%2Fy0VKX%2BXpE%2BiKxBrn19AkY4IPj01N2ZdxEkg%3D&reserved=0
---
Although we haven't identified the path that triggers this issue, adding this check is harmless.
Is it really harmless ?

The purpose of the function is to ratelimit logs. Here by returning 1 when dev->aer_info is NULL it says: don't ratelimit. Isn't it an opened door to Denial of Service by overloading with logs ?
We only skip rate limiting when dev->aer_info is NULL, which happens for
devices without AER capability. In that case, I think the trade-off is reasonable:
generating more logs is better than triggering a NULL pointer exception.

Also, this approach is consistent with other functions (for example, the stat
collection helpers) that already perform similar checks before accessing
aer_info. So extending the same safeguard here seems acceptable to me.
Christophe
quoted
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>


quoted
  drivers/pci/pcie/aer.c | 3 +++
  1 file changed, 3 insertions(+)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e286c197d7167..55abc5e17b8b1 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -786,6 +786,9 @@ static void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
  static int aer_ratelimit(struct pci_dev *dev, unsigned int severity)
  {
+    if (!dev->aer_info)
+        return 1;
+
      switch (severity) {
      case AER_NONFATAL:
          return __ratelimit(&dev->aer_info->nonfatal_ratelimit);

---
base-commit: e5f0a698b34ed76002dc5cff3804a61c80233a7a
change-id: 20250801-aer_crash_2-b21cc2ef0d00

Best regards,
-- 
Breno Leitao [off-list ref]
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help