Thread (10 messages) 10 messages, 2 authors, 2021-06-23

Re: [PATCH 1/2] nvme-pci: Fix multiple races in nvme_setup_io_queues()

From: Keith Busch <kbusch@kernel.org>
Date: 2021-06-23 01:54:02

On Tue, Jun 22, 2021 at 05:59:30PM -0700, Casey Chen wrote:
On 6/22/21 8:06 AM, Keith Busch wrote:
quoted
On Mon, Jun 21, 2021 at 05:27:09PM -0700, Casey Chen wrote:
quoted
+	/*
+	 * Free IRQ resources as soon as NVMEQ_ENABLED bit transitions
+	 * from set to unset. If there is a window to it is truely freed,
+	 * pci_free_irq_vectors() jumping into this window will crash.
+	 * And take lock to avoid racing with pci_free_irq_vectors() in
+	 * nvme_dev_disable() path.
+	 */
+	mutex_lock(&dev->shutdown_lock)
Sorry, I wasn't clear in previous review. All of the shutdown_locks
taken after initialization need to by mutex_trylock()'s. If you have to
wait to get the lock, the device is not going to be suitable for
continuing initialization after the lock is available.

And looking at this again, if trylock is successful, I think you need to
verify controller state is suitable for continuing the initialization.
I assume you mean all the mutex_lock() except the one in nvme_dev_disable(),
right ? I agree with replacing all mutex_lock() with mutex_trylock() except
the one in nvme_dev_disable(). 
I only mean the locks you are adding. The existing usages are correct
as-is.
But I doubt about checking controller state
after lock being acquired. Two reasons:

1. What state can use to check ?
If the state is not NVME_CTRL_CONNECTING after successfully taking the
shutdown_lock, then the controller is not in a state that can proceed
with IO queue setup.
Suppose there is a state set on surprise
removal, what if the state check happen before it is set while surprise
removal is on its way ?

2. Even we don't check controller state after acquiring lock, the
initialization still fail fairly soon if surprise removal happens. We don't
sacrifice any performance.
Nothing to do with performance; we're not in io path. The suggestion is
about avoiding access to a non-existent device. The timing of a surprise
removal can always defeat such a check, so my suggestion is not critical
since we expect that to be handled anyway, but in my experience,
platforms work better if the OS prevents such access when it knows it
will fail. It's not full-proof.

But as I said, everything should work even without the check, so feel
free to omit it.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help