Thread (54 messages) 54 messages, 8 authors, 2016-11-28

Re: [RFC PATCH 1/2] mempool: do not consume memory reserves from the reclaim path

From: David Rientjes <rientjes@google.com>
Date: 2016-07-19 20:46:02
Also in: dm-devel, lkml

On Tue, 19 Jul 2016, Johannes Weiner wrote:
Mempool guarantees forward progress by having all necessary memory
objects for the guaranteed operation in reserve. Think about it this
way: you should be able to delete the pool->alloc() call entirely and
still make reliable forward progress. It would kill concurrency and be
super slow, but how could it be affected by a system OOM situation?

If our mempool_alloc() is waiting for an object that an OOM victim is
holding, where could that OOM victim get stuck before giving it back?
As I asked in the previous thread, surely you wouldn't do a mempool
allocation first and then rely on an unguarded page allocation to make
forward progress, right? It would defeat the purpose of using mempools
in the first place. And surely the OOM victim wouldn't be waiting for
a lock that somebody doing mempool_alloc() *against the same mempool*
is holding. That'd be an obvious ABBA deadlock.

So maybe I'm just dense, but could somebody please outline the exact
deadlock diagram? Who is doing what, and how are they getting stuck?

cpu0:                     cpu1:
                          mempool_alloc(pool0)
mempool_alloc(pool0)
  wait for cpu1
                          not allocating memory - would defeat mempool
                          not taking locks held by cpu0* - would ABBA
                          ???
                          mempool_free(pool0)

Thanks

* or any other task that does mempool_alloc(pool0) before unlock
I'm approaching this from a perspective of any possible mempool usage, not 
with any single current user in mind.

Any mempool_alloc() user that then takes a contended mutex can do this.  
An example:

	taskA		taskB		taskC
	-----		-----		-----
	mempool_alloc(a)
			mutex_lock(b)
	mutex_lock(b)
					mempool_alloc(a)

Imagine the mempool_alloc() done by taskA depleting all free elements so 
we rely on it to do mempool_free() before any other mempool allocator can 
be guaranteed.

If taskC is oom killed, or has PF_MEMALLOC set, it cannot access memory 
reserves from the page allocator if __GFP_NOMEMALLOC is automatic in 
mempool_alloc().  This livelocks the page allocator for all processes.

taskB in this case need only stall after taking mutex_lock() successfully; 
that could be because of the oom livelock, it is contended on another 
mutex held by an allocator, etc.

Obviously taskB stalling while holding a mutex that is contended by a 
mempool user holding an element is not preferred, but it's possible.  (A 
simplified version is also possible with 0-size mempools, which are also 
allowed.)

My point is that I don't think we should be forcing any behavior wrt 
memory reserves as part of the mempool implementation.  In the above, 
taskC mempool_alloc() would succeed and not livelock unless 
__GFP_NOMEMALLOC is forced.  The mempool_alloc() user may construct their 
set of gfp flags as appropriate just like any other memory allocator in 
the kernel.

The alternative would be to ensure no mempool users ever take a lock that 
another thread can hold while contending another mutex or allocating 
memory itself.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help