Thread (2 messages) 2 messages, 2 authors, 2012-09-28

Re: [patch 1/1] thp: avoid VM_BUG_ON page_count(page) false positives in __collapse_huge_page_copy

From: Andrea Arcangeli <hidden>
Date: 2012-09-28 12:07:25

Hi Linus,

On Thu, Sep 27, 2012 at 04:40:13PM -0700, Linus Torvalds wrote:
On Thu, Sep 27, 2012 at 2:51 PM,  [off-list ref] wrote:
quoted
From: Andrea Arcangeli <redacted>
Subject: thp: avoid VM_BUG_ON page_count(page) false positives in __collapse_huge_page_copy

Some time ago Petr once reproduced a false positive VM_BUG_ON in
khugepaged while running the autonuma-benchmark on a large 8 node system.
All production kernels out there have DEBUG_VM=n so it was only noticeable
on self built kernels.  It's not easily reproducible even on the 8 nodes
system.

Use page_freeze_refs to prevent speculative pagecache lookups to
trigger the false positives, so we're still able to check the
page_count to be exact.
This is too ugly to live. It also fundamentally changes semantics and
actually makes CONFIG_DEBUG_VM result in totally different behavior.

I really don't think it's a good feature to make CONFIG_DEBUG_VM
actually seriously change serialization.

Either do the page_freeze_refs thing *unconditionally*, presumably
replacing the current code that does

                ...
                /* cannot use mapcount: can't collapse if there's a gup pin */
                if (page_count(page) != 1) {

instead, or then just relax the potentially racy VM_BUG_ON() to just
check >= 2. Because debug stuff that changes semantics really is
horribly horribly bad.
Agreed. I think we can simply drop that VM_BUG_ON: there's a
putback_lru_page that does a put_page, and then another that does
page_cache_release just after the VM_BUG_ON, so if the count is < 2
we'll notice when the count underflows in free_page_and_swap_cache
(we've a VM_BUG_ON in put_page_testzero for that).

The reason for too ugly to live patch, is that I wanted to take the
opportunity of a workload on a 128 CPU 8 node system that could
reproduce it to verify that it really was the speculative lookup and
not something else (and something else wouldn't have been good :). But
I fully agree with you that after successfully verifying that, it's
good to go with a non-ugly version. Thanks for checking this!

I'll send an updated patch.
Btw, there are two other thp patches (relating to mlock) floating
around. They look much more reasonable than this one, but I was hoping
to see more ack's for them.
Ok! I'll review them too. I read them and they looked all right but I
didn't spend enough time on it yet to be sure and send an ack sorry.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help