Re: [patch 1/1] thp: avoid VM_BUG_ON page_count(page) false positives in __collapse_huge_page_copy
From: Andrea Arcangeli <hidden>
Date: 2012-09-28 12:07:25
Hi Linus, On Thu, Sep 27, 2012 at 04:40:13PM -0700, Linus Torvalds wrote:
On Thu, Sep 27, 2012 at 2:51 PM, [off-list ref] wrote:quoted
From: Andrea Arcangeli <redacted> Subject: thp: avoid VM_BUG_ON page_count(page) false positives in __collapse_huge_page_copy Some time ago Petr once reproduced a false positive VM_BUG_ON in khugepaged while running the autonuma-benchmark on a large 8 node system. All production kernels out there have DEBUG_VM=n so it was only noticeable on self built kernels. It's not easily reproducible even on the 8 nodes system. Use page_freeze_refs to prevent speculative pagecache lookups to trigger the false positives, so we're still able to check the page_count to be exact.This is too ugly to live. It also fundamentally changes semantics and actually makes CONFIG_DEBUG_VM result in totally different behavior. I really don't think it's a good feature to make CONFIG_DEBUG_VM actually seriously change serialization. Either do the page_freeze_refs thing *unconditionally*, presumably replacing the current code that does ... /* cannot use mapcount: can't collapse if there's a gup pin */ if (page_count(page) != 1) { instead, or then just relax the potentially racy VM_BUG_ON() to just check >= 2. Because debug stuff that changes semantics really is horribly horribly bad.
Agreed. I think we can simply drop that VM_BUG_ON: there's a putback_lru_page that does a put_page, and then another that does page_cache_release just after the VM_BUG_ON, so if the count is < 2 we'll notice when the count underflows in free_page_and_swap_cache (we've a VM_BUG_ON in put_page_testzero for that). The reason for too ugly to live patch, is that I wanted to take the opportunity of a workload on a 128 CPU 8 node system that could reproduce it to verify that it really was the speculative lookup and not something else (and something else wouldn't have been good :). But I fully agree with you that after successfully verifying that, it's good to go with a non-ugly version. Thanks for checking this! I'll send an updated patch.
Btw, there are two other thp patches (relating to mlock) floating around. They look much more reasonable than this one, but I was hoping to see more ack's for them.
Ok! I'll review them too. I read them and they looked all right but I didn't spend enough time on it yet to be sure and send an ack sorry. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>