Thread (11 messages) 11 messages, 4 authors, 2021-09-21

Re: [PATCH v7 09/12] sysfs: fix deadlock race with module removal

From: Dan Williams <hidden>
Date: 2021-09-21 01:43:06
Also in: linux-block, linux-doc, linux-fsdevel, linux-kselftest, lkml

Possibly related (same subject, not in this thread)

On Fri, Sep 17, 2021 at 10:05 PM Luis Chamberlain [off-list ref] wrote:
When sysfs attributes use a lock also used on module removal we can
race to deadlock. This happens when for instance a sysfs file on
a driver is used, then at the same time we have module removal call
trigger. The module removal call code holds a lock, and then the sysfs
file entry waits for the same lock. While holding the lock the module
removal tries to remove the sysfs entries, but these cannot be removed
yet as one is waiting for a lock. This won't complete as the lock is
already held. Likewise module removal cannot complete, and so we deadlock.

This can now be easily reproducible with our sysfs selftest as follows:

./tools/testing/selftests/sysfs/sysfs.sh -t 0027

To fix this we extend the struct kernfs_node with a module reference and
use the try_module_get() after kernfs_get_active() is called which
protects integrity and the existence of the kernfs node during the
operation.

So long as the kernfs node is protected with kernfs_get_active() we know
we can rely on its contents. And, as now just documented in the previous
patch, we also now know that once kernfs_get_active() is called the module
is also guarded to exist and cannot be removed.

If try_module_get() fails we fail the operation on the kernfs node.

We use a try method as a full lock means we'd then make our sysfs
attributes busy us out from possible module removal, and so userspace
could force denying module removal, a silly form of "DOS" against module
removal. A try lock on the module removal ensures we give priority to
module removal and interacting with sysfs attributes only comes second.
Using a full lock could mean for instance that if you don't stop poking
at sysfs files you cannot remove a module.

Races between removal of sysfs files and the module are not possible
given sysfs files are created by the same module, and when a sysfs file
is being used kernfs prevents removal of the sysfs file. So if module
removal is actually happening the removal would have to wait until
the sysfs file operation is complete.

This deadlock was first reported with the zram driver, however the live
patching folks have acknowledged they have observed this as well with
live patching, when a live patch is removed. I was then able to
reproduce easily by creating a dedicated selftests.

A sketch of how this can happen follows:

CPU A                              CPU B
                                   whatever_store()
module_unload
  mutex_lock(foo)
                                   mutex_lock(foo)
   del_gendisk(zram->disk);
     device_del()
       device_remove_groups()
This flow seems possible to trigger with:

   echo $dev > /sys/bus/$bus/drivers/$driver/unbind

I am missing why module pinning is part of the solution when it's the
device_del() path that is racing? Module removal is just a more coarse
grained way to trigger unbind => device_del(). Isn't the above a bug
in the driver, not missing synchronization in kernfs? Forgive me if
the unbind question was asked and answered elsewhere, this is my first
time taking a look at this series.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help