Re: Race in memcg kmem?
From: Vladimir Davydov <hidden>
Date: 2013-12-11 06:22:18
Also in:
lkml
On 12/11/2013 03:13 AM, Glauber Costa wrote:
On Tue, Dec 10, 2013 at 5:59 PM, Vladimir Davydov [off-list ref] wrote:quoted
Hi, Looking through the per-memcg kmem_cache initialization code, I have a bad feeling that it is prone to a race. Before getting to fixing it, I'd like to ensure this race is not only in my imagination. Here it goes. We keep per-memcg kmem caches in the memcg_params field of each root cache. The memcg_params array is grown dynamically by memcg_update_cache_size(). I guess that if this function is executed concurrently with memcg_create_kmem_cache() we can get a race resulting in a memory leak.Ok, let's see.quoted
-- memcg_create_kmem_cache(memcg, cachep) -- creates a new kmem_cache corresponding to a memcg and assigns it to the root cache; called in the background - it is OK to have several such functions trying to create a cache for the same memcg concurrently, but only one of them should succeed.Yes.quoted
@cachep is the root cache @memcg is the memcg we want to create a cache for. The function: A1) assures there is no cache corresponding to the memcg (if it is we have nothing to do): idx = memcg_cache_id(memcg); if (cachep->memcg_params[idx]) goto out; A2) creates and assigns a new cache: new_cachep = kmem_cache_dup(memcg, cachep);Please note this cannot proceed in parallel with memcg_update_cache_size(), because they both take the slab mutex.
Oh, I see. memcg_create_kmem_cache() takes and releases the slab mutex in kmem_cache_create_memcg(), which is called by kmem_cache_dup(). This effectively works as a barrier that does not allow A2 to proceed until memcg_update_cache_sizes() finishes, which makes the race implausible. Did not notice that at first. Thanks for clarification!
quoted
// init new_cachep cachep->memcg_params->memcg_caches[idx] = new_cachep; -- memcg_update_cache_size(s, num_groups) -- grows s->memcg_params to accomodate data for num_groups memcg's @s is the root cache whose memcg_params we want to grow @num_groups is the new number of kmem-active cgroups (defines the new size of memcg_params array). The function: B1) allocates and assigns a new cache: cur_params = s->memcg_params; s->memcg_params = kzalloc(size, GFP_KERNEL); B2) copies per-memcg cache ptrs from the old memcg_params array to the new one: for (i = 0; i < memcg_limited_groups_array_size; i++) { if (!cur_params->memcg_caches[i]) continue; s->memcg_params->memcg_caches[i] = cur_params->memcg_caches[i]; } B3) frees the old array: kfree(cur_params); Since these two functions do not share any mutexes, we can get theThey do share a mutex, the slab mutex.quoted
following race: Assume, by the time Cpu0 gets to memcg_create_kmem_cache(), the memcg cache has already been created by another thread, so this function should do nothing. Cpu0 Cpu1 ---- ---- B1 A1 we haven't initialized memcg_params yet so Cpu0 will proceed to A2 to alloc and assign a new cache A2 B2 Cpu1 rewrites the memcg cache ptr set by Cpu0 at A2 - a memory leak? B3 I'd like to add that even if I'm right about the race, this is rather not critical, because memcg_update_cache_sizes() is called very rarely.Every race is critical. So, I am a bit lost by your description. Get back to me if I got anything wrong, but I am think that the point that you're missing is that all heavy slab operations take the slab_mutex underneath, and that includes cache creation and update.quoted
BTW, it seems to me that the way we update memcg_params in memcg_update_cache_size() make cache_from_memcg_idx() prone to use-after-free:quoted
static inline struct kmem_cache * cache_from_memcg_idx(struct kmem_cache *s, int idx) { if (!s->memcg_params) return NULL; return s->memcg_params->memcg_caches[idx]; }This is equivalent to 1) struct memcg_cache_params *params = s->memcg_params; 2) return params->memcg_caches[idx]; If memcg_update_cache_size() is executed between steps 1 and 2 on another CPU, at step 2 we will dereference memcg_params that has already been freed. This is very unlikely, but still possible. Perhaps, we should free old memcg params only after a sync_rcu()?You seem to be right in this one. Indeed, if my mind does not betray me, That is how I freed the LRUs. (with synchronize_rcus).
Yes, you freed LRUs only after a sync_rcu, that's why the way memcg_params is updated looks suspicious to me. I'll try to fix it then. Thanks.