Re: [PATCH 0/4] arch, mm: improve robustness of direct map manipulation
From: Mike Rapoport <rppt@kernel.org>
Date: 2020-10-29 08:12:50
Also in:
linux-arm-kernel, linux-mm, linux-pm, linux-riscv, linux-s390, lkml, sparclinux
On Wed, Oct 28, 2020 at 09:03:31PM +0000, Edgecombe, Rick P wrote:
quoted
On Wed, Oct 28, 2020 at 11:20:12AM +0000, Will Deacon wrote:quoted
On Tue, Oct 27, 2020 at 10:38:16AM +0200, Mike Rapoport wrote:quoted
This is a theoretical bug, but it is still not nice :)Just to clarify: this patch series fixes this problem, right?Yes.Well, now I'm confused again. As David pointed, __vunmap() should not be executing simultaneously with the hibernate operation because hibernate can't snapshot while data it needs to save is still updating. If a thread was paused when a page was in an "invalid" state, it should be remapped by hibernate before the copy. To level set, before reading this mail, my takeaways from the discussions on potential hibernate/debug page alloc problems were: Potential RISC-V issue: Doesn't have hibernate support Potential ARM issue: The logic around when it's cpa determines pages might be unmapped looks correct for current callers. Potential x86 page break issue: Seems to be ok for now, but a new set_memory_np() caller could violate assumptions in hibernate. Non-obvious thorny logic: General agreement it would be good to separate dependencies. Behavior of V1 of this patchset: No functional change other than addition of a warn in hibernate.
There is a change that adds explicit use of set_direct_map() to hibernate. Currently, in case of arm64 with DEBUG_PAGEALLOC=n if a thread was paused when a page was in an "invalid" state hibernate will access an unmapped data because __kernel_map_pages() will bail out. After the change set_direct_map_default_noflush() would be used and the page will get mapped before copy.
So "does this fix the problem", "yes" leaves me a bit confused... Not saying there couldn't be any problems, especially due to the thorniness and cross arch stride, but what is it exactly and how does this series fix it?
This series goal was primarily to separate dependincies and make it
clearer what DEBUG_PAGEALLOC and what SET_DIRECT_MAP are. As it turned
out, there is also some lack of consistency between architectures that
implement either of this so I tried to improve this as well.
Honestly, I don't know if a thread can be paused at the time __vunmap()
left invalid pages, but it could, there is an issue on arm64 with
DEBUG_PAGEALLOC=n and this set fixes it.
__vunmap()
vm_remove_mappings()
set_direct_map_invalid()
/* thread is frozen */
safe_copy_page()
__kernel_map_pages()
if (!debug_pagealloc())
return
do_copy_page() -> fault
--
Sincerely yours,
Mike.