Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback
From: Vlastimil Babka <hidden>
Date: 2024-06-19 09:56:48
Also in:
bridge, kernel-janitors, kvm, linux-block, linux-can, linux-nfs, linux-trace-kernel, lkml, netdev, netfilter-devel
On 6/19/24 11:51 AM, Uladzislau Rezki wrote:
On Tue, Jun 18, 2024 at 09:48:49AM -0700, Paul E. McKenney wrote:quoted
On Tue, Jun 18, 2024 at 11:31:00AM +0200, Uladzislau Rezki wrote:quoted
quoted
On 6/17/24 8:42 PM, Uladzislau Rezki wrote:quoted
quoted
+ + s = container_of(work, struct kmem_cache, async_destroy_work); + + // XXX use the real kmem_cache_free_barrier() or similar thing hereIt implies that we need to introduce kfree_rcu_barrier(), a new API, which i wanted to avoid initially.I wanted to avoid new API or flags for kfree_rcu() users and this would be achieved. The barrier is used internally so I don't consider that an API to avoid. How difficult is the implementation is another question, depending on how the current batching works. Once (if) we have sheaves proven to work and move kfree_rcu() fully into SLUB, the barrier might also look different and hopefully easier. So maybe it's not worth to invest too much into that barrier and just go for the potentially longer, but easier to implement?Right. I agree here. If the cache is not empty, OK, we just defer the work, even we can use a big 21 seconds delay, after that we just "warn" if it is still not empty and leave it as it is, i.e. emit a warning and we are done. Destroying the cache is not something that must happen right away.OK, I have to ask... Suppose that the cache is created and destroyed by a module and init/cleanup time, respectively. Suppose that this module is rmmod'ed then very quickly insmod'ed. Do we need to fail the insmod if the kmem_cache has not yet been fully cleaned up? If not, do we have two versions of the same kmem_cache in /proc during the overlap time?No fail :) If same cache is created several times, its s->refcount gets increased, so, it does not create two entries in the "slabinfo". But i agree that your point is good! We need to be carefully with removing and simultaneous creating.
Note that this merging may be disabled or not happen due to various flags on the cache being incompatible with it. And I want to actually make sure it never happens for caches being already destroyed as that would lead to use-after-free (the workfn doesn't recheck the refcount in case a merge would happen during the grace period)
--- a/mm/slab_common.c
+++ b/mm/slab_common.c@@ -150,9 +150,10 @@ int slab_unmergeable(struct kmem_cache *s) #endif /* - * We may have set a slab to be unmergeable during bootstrap. + * We may have set a cache to be unmergeable during bootstrap. + * 0 is for cache being destroyed asynchronously */ - if (s->refcount < 0) + if (s->refcount <= 0) return 1; return 0;
From the first glance, there is a refcounter and a global "slab_mutex"
which is used to protect a critical section. Destroying is almost fully
protected(as noted above, by a global mutex) with one exception, it is:
static void kmem_cache_release(struct kmem_cache *s)
{
if (slab_state >= FULL) {
sysfs_slab_unlink(s);
sysfs_slab_release(s);
} else {
slab_kmem_cache_release(s);
}
}
this one can race, IMO.
--
Uladzislau Rezki