Re: [PATCH 0/3] warn and suppress irqflood
From: Guilherme Piccoli <hidden>
Date: 2020-10-26 15:06:59
Also in:
kexec, lkml
On Sun, Oct 25, 2020 at 8:12 AM Pingfan Liu [off-list ref] wrote:
On Thu, Oct 22, 2020 at 4:37 PM Thomas Gleixner [off-list ref] wrote:quoted
On Thu, Oct 22 2020 at 13:56, Pingfan Liu wrote:quoted
I hit a irqflood bug on powerpc platform, and two years ago, on a x86 platform. When the bug happens, the kernel is totally occupies by irq. Currently, there may be nothing or just soft lockup warning showed in console. It is better to warn users with irq flood info. In the kdump case, the kernel can move on by suppressing the irq flood.You're curing the symptom not the cause and the cure is just magic and can't work reliably.Yeah, it is magic. But at least, it is better to printk something and alarm users about what happens. With current code, it may show nothing when system hangs.
Thanks Pingfan and Thomas for the points - I'd like to have a mechanism in the kernel to warn users when an IRQ flood is potentially happening. Some time ago (2 years) we faced a similar issue in x86-64, a hard to debug problem in kdump, that eventually was narrowed to a buggy NIC FW flooding IRQs in kdump kernel, and no messages showed (although kernel changed a lot since that time, today we might have better IRQ handling/warning). We tried an early-boot fix, by disabling MSIs (as per PCI spec) early in x86 boot, but it wasn't accepted - Bjorn asked pertinent questions that I couldn't respond (I lost the reproducer) [0]. Cheers, Guilherme [0] lore.kernel.org/linux-pci/20181018183721.27467-1-gpiccoli@canonical.com