Thread (21 messages) 21 messages, 2 authors, 2021-06-22

Re: [PATCH v3 2/3] zram: fix deadlock with sysfs attribute usage and driver removal

From: Luis Chamberlain <mcgrof@kernel.org>
Date: 2021-06-22 17:00:14
Also in: lkml

On Tue, Jun 22, 2021 at 06:51:13PM +0200, Greg KH wrote:
On Tue, Jun 22, 2021 at 09:40:27AM -0700, Luis Chamberlain wrote:
quoted
On Tue, Jun 22, 2021 at 06:27:52PM +0200, Greg KH wrote:
quoted
On Tue, Jun 22, 2021 at 08:27:13AM -0700, Luis Chamberlain wrote:
quoted
On Tue, Jun 22, 2021 at 09:41:23AM +0200, Greg KH wrote:
quoted
On Mon, Jun 21, 2021 at 04:36:34PM -0700, Luis Chamberlain wrote:
quoted
+	ssize_t __ret; \
+	if (!try_module_get(THIS_MODULE)) \
try_module_get(THIS_MODULE) is always racy and probably does not do what
you want it to do.  You always want to get/put module references from
code that is NOT the code calling these functions.
In this case, we want it to trump module removal if it succeeds. That's all.
True, but either you stop the race, or you do not right?  If you are so
invested in your load/unload test, this should show up with this code
eventually as well.
I still do not see how the race is possible give the goal to prevent
module removal if a sysfs file is being used. If rmmod is taking
place, this simply will bail out.
quoted
quoted
quoted
quoted
+		return -ENODEV; \
+	__ret = _name ## _store(dev, attr, buf, len); \
+	module_put(THIS_MODULE); \
This too is going to be racy.

While fun to poke at, I still think this is pointless.
If you have a better idea, which does not "DOS" module removal, please
let me know!
I have yet to understand why you think that the load/unload in a loop is
a valid use case.
That is dependent upon the intrastructure tests built for a driver.

In the case of fstests and blktests we have drivers which *always* get
removed and loaded on each test. Take for instance scsi_debug, which
creates / destroys virtual devices on the per test. Likewise, to build
confidence that failure rate is as close as possible to 0, one must run
a test as many times as possible in a loop. And, to build confidence in
a test, in some situations one ends up running modprobe / rmmod in a
loop.

In this case a customer does have a complex system of tests, and by looking
at the crash logs I managed to simplify the way to reproduce it using
simple shell scripts.
And is _this_ change needed even with the changes in patch 1/3?
Oh absolutely. This patch is needed 100%. Without it, it is actually
pretty trivial to deadlock as noted in my instructions on how to
reproduce.
I think that commit fixes your issues given that you will not unload the
module until after the sysfs devices are removed from the system.  Have
you tried that alone with your test?
I have tried that, and it does not resolve the deadlock.

It was *why* I have been insisting that this is a real issue, and why I
decided to instead try to implement something generic after I was hinted
by livepatch folks that they also had observed a similar deadlock, and
so that a generic solution would be appreciated by them.

  Luis
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help