Thread (38 messages) 38 messages, 4 authors, 2020-11-01

Re: [PATCH 0/4] arch, mm: improve robustness of direct map manipulation

From: Mike Rapoport <rppt@kernel.org>
Date: 2020-10-29 08:12:50
Also in: linux-arm-kernel, linux-mm, linux-pm, linux-riscv, linux-s390, lkml, sparclinux

On Wed, Oct 28, 2020 at 09:03:31PM +0000, Edgecombe, Rick P wrote:
quoted
On Wed, Oct 28, 2020 at 11:20:12AM +0000, Will Deacon wrote:
quoted
On Tue, Oct 27, 2020 at 10:38:16AM +0200, Mike Rapoport wrote:
quoted
					   	
This is a theoretical bug, but it is still not nice :) 		
					
Just to clarify: this patch series fixes this problem, right?
Yes.
Well, now I'm confused again.

As David pointed, __vunmap() should not be executing simultaneously
with the hibernate operation because hibernate can't snapshot while
data it needs to save is still updating. If a thread was paused when a
page was in an "invalid" state, it should be remapped by hibernate
before the copy.

To level set, before reading this mail, my takeaways from the
discussions on potential hibernate/debug page alloc problems were:

Potential RISC-V issue:
Doesn't have hibernate support

Potential ARM issue:
The logic around when it's cpa determines pages might be unmapped looks
correct for current callers.

Potential x86 page break issue:
Seems to be ok for now, but a new set_memory_np() caller could violate
assumptions in hibernate.

Non-obvious thorny logic: 
General agreement it would be good to separate dependencies.

Behavior of V1 of this patchset:
No functional change other than addition of a warn in hibernate.
There is a change that adds explicit use of set_direct_map() to
hibernate. Currently, in case of arm64 with DEBUG_PAGEALLOC=n if a
thread was paused when a page was in an "invalid" state hibernate will
access an unmapped data because __kernel_map_pages() will bail out.
After the change set_direct_map_default_noflush() would be used and the
page will get mapped before copy.
So "does this fix the problem", "yes" leaves me a bit confused... Not
saying there couldn't be any problems, especially due to the thorniness
and cross arch stride, but what is it exactly and how does this series
fix it?
This series goal was primarily to separate dependincies and make it
clearer what DEBUG_PAGEALLOC and what SET_DIRECT_MAP are. As it turned
out, there is also some lack of consistency between architectures that
implement either of this so I tried to improve this as well.

Honestly, I don't know if a thread can be paused at the time __vunmap()
left invalid pages, but it could, there is an issue on arm64 with
DEBUG_PAGEALLOC=n and this set fixes it.

__vunmap()
    vm_remove_mappings()
        set_direct_map_invalid()
	/* thread is frozen */
 					safe_copy_page()	
 					    __kernel_map_pages()
						if (!debug_pagealloc())
 					    	    return
 					    do_copy_page() -> fault

-- 
Sincerely yours,
Mike.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help