Thread (54 messages) 54 messages, 8 authors, 2016-11-28

Re: [dm-devel] [RFC PATCH 2/2] mm, mempool: do not throttle PF_LESS_THROTTLE tasks

From: Michal Hocko <mhocko@kernel.org>
Date: 2016-07-27 18:24:16
Also in: dm-devel, lkml

On Wed 27-07-16 13:43:35, NeilBrown wrote:
On Mon, Jul 25 2016, Michal Hocko wrote:
quoted
On Sat 23-07-16 10:12:24, NeilBrown wrote:
[...]
quoted
quoted
quoted
My thinking was that throttle_vm_writeout is there to prevent from
dirtying too many pages from the reclaim the context.  PF_LESS_THROTTLE
is part of the writeout so throttling it on too many dirty pages is
questionable (well we get some bias but that is not really reliable). It
still makes sense to throttle when the backing device is congested
because the writeout path wouldn't make much progress anyway and we also
do not want to cycle through LRU lists too quickly in that case.
"dirtying ... from the reclaim context" ??? What does that mean?
Say you would cause a swapout from the reclaim context. You would
effectively dirty that anon page until it gets written down to the
storage.
I should probably figure out how swap really works.  I have vague ideas
which are probably missing important details...
Isn't the first step that the page gets moved into the swap-cache - and
marked dirty I guess.  Then it gets written out and the page is marked
'clean'.
Then further memory pressure might push it out of the cache, or an early
re-use would pull it back from the cache.
If so, then "dirtying in reclaim context" could also be described as
"moving into the swap cache" - yes?
Yes that is basically correct
So should there be a limit on dirty
pages in the swap cache just like there is for dirty pages in any
filesystem (the max_dirty_ratio thing) ??
Maybe there is?
There is no limit AFAIK. We are relying that the reclaim is throttled
when necessary.
 
quoted
quoted
The use of PF_LESS_THROTTLE in current_may_throttle() in vmscan.c is to
avoid a live-lock.  A key premise is that nfsd only allocates unbounded
memory when it is writing to the page cache.  So it only needs to be
throttled when the backing device it is writing to is congested.  It is
particularly important that it *doesn't* get throttled just because an
NFS backing device is congested, because nfsd might be trying to clear
that congestion.
Thanks for the clarification. IIUC then removing throttle_vm_writeout
for the nfsd writeout should be harmless as well, right?
Certainly shouldn't hurt from the perspective of nfsd.
quoted
quoted
quoted
quoted
The purpose of that flag is to allow a thread to dirty a page-cache page
as part of cleaning another page-cache page.
So it makes sense for loop and sometimes for nfsd.  It would make sense
for dm-crypt if it was putting the encrypted version in the page cache.
But if dm-crypt is just allocating a transient page (which I think it
is), then a mempool should be sufficient (and we should make sure it is
sufficient) and access to an extra 10% (or whatever) of the page cache
isn't justified.
If you think that PF_LESS_THROTTLE (ab)use in mempool_alloc is not
appropriate then would a PF_MEMPOOL be any better?
Why a PF rather than a GFP flag?
Well, short answer is that gfp masks are almost depleted.
Really?  We have 26.

pagemap has a cute hack to store both GFP flags and other flag bits in
the one 32 it number per address_space.  'struct address_space' could
afford an extra 32 number I think.

radix_tree_root adds 3 'tag' flags to the gfp_mask.
There is 16bits of free space in radix_tree_node (between 'offset' and
'count').  That space on the root node could store a record of which tags
are set anywhere.  Or would that extra memory de-ref be a killer?
Yes these are reasons why adding new gfp flags is more complicated.
I think we'd end up with cleaner code if we removed the cute-hacks.  And
we'd be able to use 6 more GFP flags!!  (though I do wonder if we really
need all those 26).
Well, maybe we are able to remove those hacks, I wouldn't definitely
be opposed.  But right now I am not even convinced that the mempool
specific gfp flags is the right way to go.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help