Re: [PATCH v3 2/3] zram: fix deadlock with sysfs attribute usage and driver removal
From: Luis Chamberlain <mcgrof@kernel.org>
Date: 2021-06-22 17:00:14
Also in:
lkml
On Tue, Jun 22, 2021 at 06:51:13PM +0200, Greg KH wrote:
On Tue, Jun 22, 2021 at 09:40:27AM -0700, Luis Chamberlain wrote:quoted
On Tue, Jun 22, 2021 at 06:27:52PM +0200, Greg KH wrote:quoted
On Tue, Jun 22, 2021 at 08:27:13AM -0700, Luis Chamberlain wrote:quoted
On Tue, Jun 22, 2021 at 09:41:23AM +0200, Greg KH wrote:quoted
On Mon, Jun 21, 2021 at 04:36:34PM -0700, Luis Chamberlain wrote:quoted
+ ssize_t __ret; \ + if (!try_module_get(THIS_MODULE)) \try_module_get(THIS_MODULE) is always racy and probably does not do what you want it to do. You always want to get/put module references from code that is NOT the code calling these functions.In this case, we want it to trump module removal if it succeeds. That's all.True, but either you stop the race, or you do not right? If you are so invested in your load/unload test, this should show up with this code eventually as well.I still do not see how the race is possible give the goal to prevent module removal if a sysfs file is being used. If rmmod is taking place, this simply will bail out.quoted
quoted
quoted
quoted
+ return -ENODEV; \ + __ret = _name ## _store(dev, attr, buf, len); \ + module_put(THIS_MODULE); \This too is going to be racy. While fun to poke at, I still think this is pointless.If you have a better idea, which does not "DOS" module removal, please let me know!I have yet to understand why you think that the load/unload in a loop is a valid use case.That is dependent upon the intrastructure tests built for a driver. In the case of fstests and blktests we have drivers which *always* get removed and loaded on each test. Take for instance scsi_debug, which creates / destroys virtual devices on the per test. Likewise, to build confidence that failure rate is as close as possible to 0, one must run a test as many times as possible in a loop. And, to build confidence in a test, in some situations one ends up running modprobe / rmmod in a loop. In this case a customer does have a complex system of tests, and by looking at the crash logs I managed to simplify the way to reproduce it using simple shell scripts.And is _this_ change needed even with the changes in patch 1/3?
Oh absolutely. This patch is needed 100%. Without it, it is actually pretty trivial to deadlock as noted in my instructions on how to reproduce.
I think that commit fixes your issues given that you will not unload the module until after the sysfs devices are removed from the system. Have you tried that alone with your test?
I have tried that, and it does not resolve the deadlock. It was *why* I have been insisting that this is a real issue, and why I decided to instead try to implement something generic after I was hinted by livepatch folks that they also had observed a similar deadlock, and so that a generic solution would be appreciated by them. Luis