Re: [LSF/MM TOPIC] Handling of managed IRQs when hotplugging CPUs
From: Hannes Reinecke <hare@suse.de>
Date: 2019-02-19 14:24:33
On 2/19/19 3:19 AM, Ming Lei wrote:
On Tue, Feb 5, 2019 at 11:30 PM Hannes Reinecke [off-list ref] wrote:quoted
Hi all, this came up during discussion on the mailing list (cf thread "Question on handling managed IRQs when hotplugging CPUs"). The problem is that with managed IRQs and block-mq I/O will be routed to individual CPUs, and the response will be send to the IRQ assigned to that CPU. If now a CPU hotplug event occurs when I/O is still in-flight the IRQ will _still_ be assigned to the CPU, causing any pending interrupt to be lost. Hence the driver will never notice that an interrupt has happened, and an I/O timeout occurs.Lots of driver's timeout handler only returns BLK_EH_RESET_TIMER, and this situation can't be covered by IO timeout for these devices. For example, we have see IO hang issue on HPSA, megaraid_sas before when wrong msi vector is set on IO command. Even one such issue on aacraid isn't fixed yet.quoted
One proposal was to quiesce the device when a CPU hotplug event occurs, and only allow for CPU hotplugging once it's fully quiesced.That is the original solution, but big problem is that queue dependency exists, such as loop/DM's queue depends on underlying's queue, NVMe IO queue depends on its admin queue.quoted
While this would be working, it will be introducing quite some system stall, and it actually a rather big impact in the system. Another possiblity would be to have the driver abort the requests itself, but this requires specific callbacks into the driver, and, of course, the driver having the ability to actually do so. I would like to discuss at LSF/MM how these issues can be addressed best.One related topic is that the current static queue mapping without CPU hotplug handler involved may waste lots of IRQ vectors[1], and how to deal with this problem? [1] http://lists.infradead.org/pipermail/linux-nvme/2019-January/021961.html
Yes, ideally I would like to touch upon that, too. Additionally we have the issue raised by the mpt3sas folks [2], where they ran into a CPU lockup when having more CPU cores than interrupts. [2] https://patchwork.kernel.org/cover/10811825 Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)