Thread (92 messages) 92 messages, 7 authors, 2021-11-03

Re: [PATCH v8 11/12] zram: fix crashes with cpu hotplug multistate

From: Luis Chamberlain <mcgrof@kernel.org>
Date: 2021-11-03 12:44:49
Also in: linux-block, linux-fsdevel, linux-kselftest, live-patching, lkml

On Wed, Nov 03, 2021 at 08:01:45AM +0800, Ming Lei wrote:
On Tue, Nov 02, 2021 at 09:25:44AM -0700, Luis Chamberlain wrote:
quoted
On Tue, Nov 02, 2021 at 04:24:06PM +0100, Petr Mladek wrote:
quoted
On Wed 2021-10-27 13:57:40, Miroslav Benes wrote:
quoted
quoted
From my perspective, it is quite easy to get it wrong due to either a lack 
of generic support, or missing rules/documentation. So if this thread 
leads to "do not share locks between a module removal and a sysfs 
operation" strict rule, it would be at least something. In the same 
manner as Luis proposed to document try_module_get() expectations.
The rule "do not share locks between a module removal and a sysfs
operation" is not clear to me.
That's exactly it. It *is* not. The test_sysfs selftest will hopefully
help with this. But I'll wait to take a final position on whether or not
a generic fix should be merged until the Coccinelle patch which looks
for all uses cases completes.

So I think that once that Coccinelle hunt is done for the deadlock, we
should also remind folks of the potential deadlock and some of the rules
you mentioned below so that if we take a position that we don't support
this, we at least inform developers why and what to avoid. If Coccinelle
finds quite a bit of cases, then perhaps evaluating the generic fix
might be worth evaluating.
quoted
IMHO, there are the following rules:

1. rule: kobject_del() or kobject_put() must not be called under a lock that
	 is used by store()/show() callbacks.

   reason: kobject_del() waits until the sysfs interface is destroyed.
	 It has to wait until all store()/show() callbacks are finished.
Right, this is what actually started this entire conversation.

Note that as Ming pointed out, the generic kernfs fix I proposed would
only cover the case when kobject_del() ends up being called on module
exit, so it would not cover the cases where perhaps kobject_del() might
be called outside of module exit, and so the cope of the possible
deadlock then increases in scope.

Likewise, the Coccinelle hunt I'm trying would only cover the module
exit case. I'm a bit of afraid of the complexity of a generic hunt
as expresed in rule 1.
Question is that why one shared lock is required between kobject_del()
and its show()/store(), both zram and livepatch needn't that. Is it
one common usage?
That is the question the coccinelle hunt is aimed at finding. Answering
that in the context of module removal is easier than the generic case.

But also note that I had mentioned before that we have semantics to
check *when* we're in the module removal case, and as such can address
that case. For the other cases we have no possible semantics to be able to
address a generic fix. I tried though, refer to my reply in this
thread and refer to the new kobject_being_removed() I'm adding:

https://lkml.kernel.org/r/YWdMpv8lAFYtc18c@bombadil.infradead.org

So we have semantics for knowing when about to remove a module but,
my attempt with kobject_being_removed() isn't sufficient to address this
generically.

In either case, having a gauge of how common this is either on module
removal of generally would be wonderful. It is easier to answer the
question from a module removal perspective though.
quoted
quoted
2. rule: kobject_del()/kobject_put() must not be called from the
	related store() callbacks.

   reason: same as in 1st rule.
Sensible corollary.

Given tha the exact kobjet_del() / kobject_put() which must not be
called from the respective sysfs ops depends on which kobject is
underneath the device for which the sysfs ops is being created,
it would make this hunt in Coccinelle a bit tricky. My current iteration
of a coccinelle hunt cheats and looks at any sysfs looking op and
ensures a module exit exists.
Actually kernfs/sysfs provides interface for supporting deleting
kobject/attr from the attr's show()/store(), see example of
sdev_store_delete(), and the livepatch example:

https://lore.kernel.org/lkml/20211102145932.3623108-4-ming.lei@redhat.com/ (local)
Imagine that.. is that the suicidal thing?
quoted
quoted
3. rule: module_exit() must wait until all release() callbacks are called
	 when kobject are static.

   reason: kobject_put() must be called to clean up internal
	dependencies. The clean up might be done asynchronously
	and need access to the kobject structure.
This might be an easier rule to implement a respective Coccinelle rule
for.
If kobject_del() is done in module_exit() or before module_exit(),
kobject should have been freed in module_exit() via kobject_put().

But yes, it can be asynchronously because of CONFIG_DEBUG_KOBJECT_RELEASE,
seems like one real issue.
Alright thanks for confirming.

  Luis
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help