Re: [PATCH v7 09/12] sysfs: fix deadlock race with module removal
From: Dan Williams <hidden>
Date: 2021-09-21 01:43:06
Also in:
linux-block, linux-doc, linux-fsdevel, linux-kselftest, lkml
Possibly related (same subject, not in this thread)
- 2021-09-21 · Re: [PATCH v7 09/12] sysfs: fix deadlock race with module removal · Luis Chamberlain <mcgrof@kernel.org>
On Fri, Sep 17, 2021 at 10:05 PM Luis Chamberlain [off-list ref] wrote:
When sysfs attributes use a lock also used on module removal we can
race to deadlock. This happens when for instance a sysfs file on
a driver is used, then at the same time we have module removal call
trigger. The module removal call code holds a lock, and then the sysfs
file entry waits for the same lock. While holding the lock the module
removal tries to remove the sysfs entries, but these cannot be removed
yet as one is waiting for a lock. This won't complete as the lock is
already held. Likewise module removal cannot complete, and so we deadlock.
This can now be easily reproducible with our sysfs selftest as follows:
./tools/testing/selftests/sysfs/sysfs.sh -t 0027
To fix this we extend the struct kernfs_node with a module reference and
use the try_module_get() after kernfs_get_active() is called which
protects integrity and the existence of the kernfs node during the
operation.
So long as the kernfs node is protected with kernfs_get_active() we know
we can rely on its contents. And, as now just documented in the previous
patch, we also now know that once kernfs_get_active() is called the module
is also guarded to exist and cannot be removed.
If try_module_get() fails we fail the operation on the kernfs node.
We use a try method as a full lock means we'd then make our sysfs
attributes busy us out from possible module removal, and so userspace
could force denying module removal, a silly form of "DOS" against module
removal. A try lock on the module removal ensures we give priority to
module removal and interacting with sysfs attributes only comes second.
Using a full lock could mean for instance that if you don't stop poking
at sysfs files you cannot remove a module.
Races between removal of sysfs files and the module are not possible
given sysfs files are created by the same module, and when a sysfs file
is being used kernfs prevents removal of the sysfs file. So if module
removal is actually happening the removal would have to wait until
the sysfs file operation is complete.
This deadlock was first reported with the zram driver, however the live
patching folks have acknowledged they have observed this as well with
live patching, when a live patch is removed. I was then able to
reproduce easily by creating a dedicated selftests.
A sketch of how this can happen follows:
CPU A CPU B
whatever_store()
module_unload
mutex_lock(foo)
mutex_lock(foo)
del_gendisk(zram->disk);
device_del()
device_remove_groups()This flow seems possible to trigger with: echo $dev > /sys/bus/$bus/drivers/$driver/unbind I am missing why module pinning is part of the solution when it's the device_del() path that is racing? Module removal is just a more coarse grained way to trigger unbind => device_del(). Isn't the above a bug in the driver, not missing synchronization in kernfs? Forgive me if the unbind question was asked and answered elsewhere, this is my first time taking a look at this series.