Thread (92 messages) 92 messages, 7 authors, 2021-11-03

Re: [PATCH v8 11/12] zram: fix crashes with cpu hotplug multistate

From: Ming Lei <hidden>
Date: 2021-10-20 07:50:13
Also in: linux-doc, linux-fsdevel, linux-kselftest, live-patching, lkml

On Wed, Oct 20, 2021 at 08:43:37AM +0200, Miroslav Benes wrote:
On Tue, 19 Oct 2021, Ming Lei wrote:
quoted
On Tue, Oct 19, 2021 at 08:23:51AM +0200, Miroslav Benes wrote:
quoted
quoted
quoted
By you only addressing the deadlock as a requirement on approach a) you are
forgetting that there *may* already be present drivers which *do* implement
such patterns in the kernel. I worked on addressing the deadlock because
I was informed livepatching *did* have that issue as well and so very
likely a generic solution to the deadlock could be beneficial to other
random drivers.
In-tree zram doesn't have such deadlock, if livepatching has such AA deadlock,
just fixed it, and seems it has been fixed by 3ec24776bfd0.
I would not call it a fix. It is a kind of ugly workaround because the 
generic infrastructure lacked (lacks) the proper support in my opinion. 
Luis is trying to fix that.
What is the proper support of the generic infrastructure? I am not
familiar with livepatching's model(especially with module unload), you mean
livepatching have to do the following way from sysfs:

1) during module exit:
	
	mutex_lock(lp_lock);
	kobject_put(lp_kobj);
	mutex_unlock(lp_lock);
	
2) show()/store() method of attributes of lp_kobj
	
	mutex_lock(lp_lock)
	...
	mutex_unlock(lp_lock)
Yes, this was exactly the case. We then reworked it a lot (see 
958ef1e39d24 ("livepatch: Simplify API by removing registration step"), so 
now the call sequence is different. kobject_put() is basically offloaded 
to a workqueue scheduled right from the store() method. Meaning that 
Luis's work would probably not help us currently, but on the other hand 
the issues with AA deadlock were one of the main drivers of the redesign 
(if I remember correctly). There were other reasons too as the changelog 
of the commit describes.

So, from my perspective, if there was a way to easily synchronize between 
a data cleanup from module_exit callback and sysfs/kernfs operations, it 
could spare people many headaches.
kobject_del() is supposed to do so, but you can't hold a shared lock
which is required in show()/store() method. Once kobject_del() returns,
no pending show()/store() any more.

The question is that why one shared lock is required for livepatching to
delete the kobject. What are you protecting when you delete one kobject?
 
quoted
IMO, the above usage simply caused AA deadlock. Even in Luis's patch
'zram: fix crashes with cpu hotplug multistate', new/same AA deadlock
(hot_remove_store() vs. disksize_store() or reset_store()) is added
because hot_remove_store() isn't called from module_exit().

Luis tries to delay unloading module until all show()/store() are done. But
that can be obtained by the following way simply during module_exit():

	kobject_del(lp_kobj); //all pending store()/show() from lp_kobj are done,
						  //no new store()/show() can come after
						  //kobject_del() returns	
	mutex_lock(lp_lock);
	kobject_put(lp_kobj);
	mutex_unlock(lp_lock);
kobject_del() already calls kobject_put(). Did you mean __kobject_del(). 
That one is internal though.
kobject_del() is counter-part of kobject_add(), and kobject_put() will
call kobject_del() automatically() if it isn't deleted yet, but usually
kobject_put() is for releasing the object only. It is more often to
release kobject by calling kobject_del() and kobject_put().
 
quoted
Or can you explain your requirement on kobject/module unload in a bit
details?
Does the above makes sense?
I think now focus is the shared lock between kobject_del() and
show()/store() of the kobject's attributes.


Thanks,
Ming
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help