RE: [PATCH] mm: implement WasActive page flag (for improving cleancache)

From: Dan Magenheimer <hidden>
Date: 2012-01-27 17:32:55
Also in: lkml

From: James Bottomley [mailto:James.Bottomley@HansenPartnership.com]
Subject: RE: [PATCH] mm: implement WasActive page flag (for improving cleancache)

On Thu, 2012-01-26 at 18:43 -0800, Dan Magenheimer wrote:

quoted

From: Andrew Morton [mailto:akpm@linux-foundation.org]
It really didn't tell us anything, apart from referring to vague
"problems on streaming workloads", which forces everyone to go off and
do an hour or two's kernel archeology, probably in the area of
readahead.

Just describe the problem!  Why is it slow?  Where's the time being
spent?  How does the proposed fix (which we haven't actually seen)
address the problem?  If you inform us of these things then perhaps
someone will have a useful suggestion.  And as a side-effect, we'll
understand cleancache better.

Sorry, I'm often told that my explanations are long-winded so
as a result I sometimes err on the side of brevity...

The problem is that if a pageframe is used for a page that is
very unlikely (or never going) to be used again instead of for
a page that IS likely to be used again, it results in more
refaults (which means more I/O which means poorer performance).
So we want to keep pages that are most likely to be used again.
And pages that were active are more likely to be used again than
pages that were never active... at least the post-2.6.27 kernel
makes that assumption.  A cleancache backend can keep or discard
any page it pleases... it makes sense for it to keep pages
that were previously active rather than pages that were never
active.

For zcache, we can store twice as many pages per pageframe.
But if we are storing two pages that are very unlikely
(or never going) to be used again instead of one page
that IS likely to be used again, that's probably still a bad choice.
Further, for every page that never gets used again (or gets reclaimed
before it can be used again because there's so much data streaming
through cleancache), zcache wastes the cpu cost of a page compression.
On newer machines, compression is suitably fast that this additional
cpu cost is small-ish.  On older machines, it adds up fast and that's
what Nebojsa was seeing in https://lkml.org/lkml/2011/8/17/351

Page replacement algorithms are all about heuristics and
heuristics require information.  The WasActive flag provides
information that has proven useful to the kernel (as proven
by the 2.6.27 page replacement design rewrite) to cleancache
backends (such as zcache).

So this sounds very similar to the recent discussion which I cc'd to
this list about readahead:

http://marc.info/?l=linux-scsi&m=132750980203130

It sounds like we want to measure something similar (whether a page has
been touched since it was brought in).  It isn't exactly your WasActive
flag because we want to know after we bring a page in for readahead was
it ever actually used, but it's very similar.

Agreed, there is similarity here.

What I was wondering was instead of using a flag, could we make the LRU
lists do this for us ... something like have a special LRU list for
pages added to the page cache but never referenced since added?  It
sounds like you can get your WasActive information from the same type of
LRU list tricks (assuming we can do them).

Hmmm... I think this would mean more LRU queues but that may be
the right long term answer.  Something like?

	read but not used yet LRU
	readahead but not used yet LRU
	active LRU
	previously active LRU

Naturally, this then complicates the eviction selection process.

I think the memory pressure eviction heuristic is: referenced but not
recently used pages first followed by unreferenced and not recently used
readahead pages.  The key being to keep recently read in readahead pages
until last because there's a time between doing readahead and getting
the page accessed and we don't want to trash a recently red in readahead
page only to have the process touch it and find it has to be read in
again.

I suspect that any further progress on the page replacement heuristics
is going to require more per-page data to be stored.  Which means
that it probably won't work under the constraints of 32-bit systems.
So it might also make sense to revisit when/whether to allow the
heuristics to be better on a 64-bit system than on a 32-bit.
(Even ARM now has 64-bit processors!)  I put this on my topic list
for LSF/MM, though I have no history of previous discussion so
this may already have previously been decided.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help