Re: [PATCH v2 09/30] cxlflash: Fix to stop interrupt processing on remove
From: Tomas Henzl <hidden>
Date: 2015-09-21 11:34:02
Also in:
linux-scsi
On 19.9.2015 01:26, Matthew R. Ochs wrote:
quoted
On Sep 18, 2015, at 6:59 AM, Tomas Henzl [off-list ref] wrote: On 17.9.2015 19:16, Matthew R. Ochs wrote:quoted
quoted
On Sep 17, 2015, at 7:38 AM, Tomas Henzl [off-list ref] wrote: On 16.9.2015 18:53, Matthew R. Ochs wrote:quoted
Interrupt processing can run in parallel to a remove operation. This can lead to a condition where the interrupt handler is processing with memory that has been freed. To avoid processing an interrupt while memory may be yanked, check for removal while in the interrupt handler. Bail when removal is imminent. Signed-off-by: Matthew R. Ochs <redacted> Signed-off-by: Manoj N. Kumar <redacted> --- drivers/scsi/cxlflash/common.h | 2 ++ drivers/scsi/cxlflash/main.c | 21 +++++++++++++++------ 2 files changed, 17 insertions(+), 6 deletions(-)diff --git a/drivers/scsi/cxlflash/common.h b/drivers/scsi/cxlflash/common.h index 1abe4e0..03d2cc6 100644 --- a/drivers/scsi/cxlflash/common.h +++ b/drivers/scsi/cxlflash/common.h@@ -103,6 +103,8 @@ struct cxlflash_cfg {enum cxlflash_lr_state lr_state; int lr_port; + atomic_t remove_active; + struct cxl_afu *cxl_afu; struct pci_pool *cxlflash_cmd_pool;diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c index 6e85c77..89ee648 100644 --- a/drivers/scsi/cxlflash/main.c +++ b/drivers/scsi/cxlflash/main.c@@ -892,6 +892,7 @@ static void cxlflash_remove(struct pci_dev *pdev)spin_unlock_irqrestore(&cfg->tmf_waitq.lock, lock_flags); cfg->state = STATE_FAILTERM; + atomic_inc(&cfg->remove_active);Hi Matthew, you could just call term_afu at this point, this way you don't need an additional check in all irq functions. Cheers, TomasHi Tomas, We actually do call term_afu() a few lines down from here. I don't follow how moving it here would help things.When you disable ints sooner (that is what term_afu does ?) you'll get no more ints later isn't this what you want?Correct, that's what we want.quoted
quoted
The reason for the atomic was to provide something lightweight that we could check _inside_ the processing loop for the read-response queue handler. A check outside that loop doesn't really provide much in terms of closing or narrowing down the window of when freed memory can be accessed. As David Laight correctly pointed out, this approach does not completely close the window. We'd need something heavier to fully protect (e.g. a lock to wrap around the entire loop). I will look into adding this in a future cycle when I can adequately test.term_afu calls free_irq and this function does not return until any executing interrupts for have completed. This is the sync mechanism you need, it's lightweight (does not add an additional check to your irq functions) and closes the race window completely.Thanks for clarifying! I looked at this closer and you are correct, free_irq() guarantees not to return until the interrupt handler has completed. The current location of term_afu() is appropriate as the memory that the handler touches is not freed until the very end [by free_mem() and scsi_host_put()] of the remove. Thus we can simply ignore this patch (I'll remove it in a v3).
OK. In some future patch please reorganize the remove function so, that it follows the template described in Documentation/PCI/pci.txt : Disable the device from generating IRQs Release the IRQ (free_irq()) Stop all DMA activity Release DMA buffers (both streaming and coherent) Unregister from other subsystems (e.g. scsi or netdev) Release MMIO/IOP resources Disable the device
-matt -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html