Re: [PATCH] ksm: add vm_stat and meminfo entry to reflect pte mapping to ksm pages

From: Nai Xia <hidden>
Date: 2011-03-19 14:55:40
Also in: kernel-janitors, lkml

On Sat, Mar 19, 2011 at 6:40 AM, Hugh Dickins [off-list ref] wrote:

On Fri, 18 Mar 2011, Nai Xia wrote:

quoted

On Thursday 03 March 2011, at 06:31:42, <Andrew Morton [off-list ref]> wrote
This patch obviously wasn't tested with CONFIG_KSM=n, which was a
pretty basic patch-testing failure :(

Oops, I will be careful to avoid similar mistakes next time.

quoted

I fixed up my tree with the below, but really the amount of ifdeffing
is unacceptable - please find a cleaner way to fix up this patch.

Ok, I will have a try in my next patch submit.

A couple of notes on that.

akpm's fixup introduced an #ifdef CONFIG_KSM in mm/ksm.c: that should
be, er, unnecessary - since ksm.c is only compiled when CONFIG_KSM=y.

This was lately pointed out by me and canceled by another patch in
mm-commits@vger.kernel.org and CCed to your obsolete email
address: hugh.dickins@tiscali.co.uk I think.

And PageKsm(page) evaluates to 0 when CONFIG_KSM is not set, so the
optimizer should eliminate code from most places without #ifdef:
though you need to keep the #ifdef around display in /proc/meminfo
itself, so as not to annoy non-KSM people with an always 0kB line.

This is just what I thought before I introduced NR_KSM_PAGES_SHARING,
which then did break the compiling. My mistake.

But I am uncomfortable with the whole patch.

Can you make a stronger case for it?  KSM is designed to have its own
cycle, and to keep out of the way of the rest of mm as much as possible
(not as much as originally hoped, I admit).  Do we really want to show
its statistics in /proc/meminfo now?  And do we really care that they
don't keep up with exiting processes when the scan rate is low?

OK, I have to explain, here.
This patch is actually a tiny part of a bunch of code I wrote to improve ksm
in several aspects(This is somewhat off the topic but if you are interested,
please take at look at  http://code.google.com/p/uksm/, It's still on
very early
stage).

In my code, the inconsistency is amplified by non-uniform
scan speed for different VMAs and significantly improved max scan speed.
Then I think this patch may also be helpful to ksm itself.  Just as you said,
I had thought it at least improves the accuracy.

I am not asserting that we don't, nor am I nacking your patch:
but I would like to hear more support for it, before it adds
yet another line to our user interface in /proc/meminfo.

Then how about not touching the sexy meminfo and adding a new
interface file in /sys/kernel/mm/ksm/ ?  OK, on condition that the bug
below can be properly solved.

And there is an awkward little bug in your patch, which amplifies
a more significant and shameful pair of bugs of mine in KSM itself -
no wonder that I'm anxious about your patch going in!

Your bug is precisely where akpm added the #ifdef in ksm.c.  The
problem is that page_mapcount() is maintained atomically, generally
without spinlock or pagelock: so the value of mapcount there, unless
it is 1, can go up or down racily (as other processes sharing that
anonymous page fork or unmap at the same time).

You are right,  copy_one_pte does not take page lock. So it's definitely a
bug in my patch, although it did not appear in my tests.  Actually, there is
another issue in my patch: It tries to count all the ptes, while actually only
those changed by ksmd really matter, those added by fork does not mean
memory savings. I had thought not taking the mapcount
, instead, only increase the count by one each time a pte is changed by ksmd,
but It seems also hard to tell a pte mapped to ksm page was previously
changed by ksmd or by fork when it gets unmapped.

So indeed, I have no idea to fix this bug for the time being.

I could hardly complain about that, while suggesting above that more
approximate numbers are good enough!  Except that, when KSM is turned
off, there's a chance that you'd be left showing a non-0 kB in
/proc/meminfo.  Then people will want a fix, and I don't yet know
what that fix will be.

My first bug is in the break_cow() technique used to get back to
normal, when merging into a KSM page fails for one reason or another:
that technique misses other mappings of the page.  I did have a patch
in progress to fix that a few months ago, but it wasn't quite working,
and then I realized the second bug: that even when successful, if
VM_UNMERGEABLE has been used in forked processes, then we could end up
with a KSM page in a VM_UNMERGEABLE area, which is against the spec.

A solution to all three problems would be to revert to allocating a
separate KSM page, instead of using one of the pages already there.
But that feels like a regression, and I don't think anybody is really
hurting from the current situation, so I've not jumped to fix it yet.

Hugh

Yes, I agree on your point. Let's hope there is an efficient and
simple solution.
But for now,  please drop this patch, Andrew.

Nai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help