Thread (54 messages) 54 messages, 8 authors, 2016-11-28

Re: [dm-devel] [RFC PATCH 2/2] mm, mempool: do not throttle PF_LESS_THROTTLE tasks

From: Michal Hocko <mhocko@kernel.org>
Date: 2016-08-12 12:32:47
Also in: lkml

On Thu 04-08-16 14:49:41, Mikulas Patocka wrote:

On Wed, 3 Aug 2016, Michal Hocko wrote:
quoted
On Wed 03-08-16 08:53:25, Mikulas Patocka wrote:
quoted

On Thu, 28 Jul 2016, Michal Hocko wrote:
quoted
quoted
quoted
quoted
I think we'd end up with cleaner code if we removed the cute-hacks.  And
we'd be able to use 6 more GFP flags!!  (though I do wonder if we really
need all those 26).
Well, maybe we are able to remove those hacks, I wouldn't definitely
be opposed.  But right now I am not even convinced that the mempool
specific gfp flags is the right way to go.
I'm not suggesting a mempool-specific gfp flag.  I'm suggesting a
transient-allocation gfp flag, which would be quite useful for mempool.

Can you give more details on why using a gfp flag isn't your first choice
for guiding what happens when the system is trying to get a free page
:-?
If we get rid of throttle_vm_writeout then I guess it might turn out to
be unnecessary. There are other places which will still throttle but I
believe those should be kept regardless of who is doing the allocation
because they are helping the LRU scanning sane. I might be wrong here
and bailing out from the reclaim rather than waiting would turn out
better for some users but I would like to see whether the first approach
works reasonably well.
If we are swapping to a dm-crypt device, the dm-crypt device is congested 
and the underlying block device is not congested, we should not throttle 
mempool allocations made from the dm-crypt workqueue. Not even a little 
bit.
But the device congestion is not the only condition required for the
throttling. The pgdat has also be marked congested which means that the
LRU page scanner bumped into dirty/writeback/pg_reclaim pages at the
tail of the LRU. That should only happen if we are rotating LRUs too
quickly. AFAIU the reclaim shouldn't allow free ticket scanning in that
situation.
The obvious problem here is that mempool allocations should sleep in 
mempool_alloc() on &pool->wait (until someone returns some entries into 
the mempool), they should not sleep inside the page allocator.
I agree that mempool_alloc should _primarily_ sleep on their own
throttling mechanism. I am not questioning that. I am just saying that
the page allocator has its own throttling which it relies on and that
cannot be just ignored because that might have other undesirable side
effects. So if the right approach is really to never throttle certain
requests then we have to bail out from a congested nodes/zones as soon
as the congestion is detected.

Now, I would like to see that something like that is _really_ necessary.
I believe that we should simply start with easier part and get rid of
throttle_vm_writeout because that seems like a left over from the past.
If that turns out unsatisfactory and we have clear picture when the
throttling is harmful/suboptimal then we can move on with a more complex
solution. Does this sound like a way forward?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help