[PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel

From: AKASHI Takahiro <hidden>
Date: 2017-01-23 09:51:46
Also in: kexec

Mark,

On Thu, Jan 19, 2017 at 11:28:50AM +0000, Mark Rutland wrote:

On Thu, Jan 19, 2017 at 06:49:42PM +0900, AKASHI Takahiro wrote:

quoted

On Tue, Jan 17, 2017 at 11:54:42AM +0000, Mark Rutland wrote:

quoted

On Tue, Jan 17, 2017 at 05:20:44PM +0900, AKASHI Takahiro wrote:

quoted

On Fri, Jan 13, 2017 at 11:39:15AM +0000, Mark Rutland wrote:

quoted

Great! I think it would be better to follow the approach of
mark_rodata_ro(), rather than opening up set_memory_*(), but otherwise,
it looks like it should work.

I'm not quite sure what the approach of mark_rodata_ro() means, but
I found that using create_mapping_late() may cause two problems:

1) it fails when PTE_CONT bits mismatch between an old and new mmu entry.
   This can happen, say, if the memory range for crash dump kernel
   starts in the mid of _continuous_ pages.

That should only happen if we try to remap a segment different to what
we originally mapped.

I was intending that we'd explicitly map the reserved region separately
in the boot path, like we do for kernel segments in map_kernel(). We
would allow sections and/or CONT entires. 

Then, in __map_memblock() we'd then skip that range as we do for the
linear map alias of the kernel image.

That way, we can later use create_mapping_late for that same region, and
it should handle sections and/or CONT entries in the exact same way as
it does for the kernel image segments in mark_rodata_ro().

I see.
Which one do you prefer, yours above or my (second) solution?
Either way, they do almost the same thing in terms of mapping.

While both should work, I'd prefer to match the existing map_kernel()
logic, (i.e. my suggestion above), for consistency.

OK

quoted

I don't think we have much code useful for unmapping. We could re-use 
create_mapping_late for this, passing a set of prot bits that means the
entries are invalid (e.g. have a PAGE_KERNEL_INVALID).

Do you really think that we should totally invalidate mmu entries?
I guess that, given proper cache & TLB flush operations, RO attribute is
good enough for memory consistency, no?
(None accesses the region, as I said, except in the case of re-loading
crash dump kernel though.)

My worry is that the first kernel and kdump kernel may map (portions of)
the region with potentially confliciting memory attributes. So it would
be necessary to completely unmap the region.

I think that this can happen only if the second kernel boots up,
leaving non-crashed cpus still running for some reason.

You raise a good point that this would also mean we need to perform some
cache maintenance, which makes that a little more painful. We'd need a
sequence like:

* Unmap the region
* TLB invalidation
* Remap the region with non-cacheable attributes
* Cache maintenance
* Unmap the region
* TLB invalidation

I don't get why we need to remap the region and do cache
maintenance here. Please elaborate a bit more?
My current implementation of arch_kexec_protect_crashkres() is:

        kexec_segment_flush(kexec_crash_image);
        create_mapping_late(crashk_res.start, ..., __pgprot(0));
                                                or PAGE_KERNEL_INVALID
        flush_tlb_all();

kexec_segment_flush() will eventually do dcache-flush for all the modified
data in crash dump kernel memory.

quoted

We'd have to perform the TLB invalidation ourselves, but that shouldn't
be too painful.

Do we need to invalidate TLBs not only before but also after changing
permission attributes as make_rodata_ro() does?

I believe we'd only have to perform the TLB invalidation after the
change of attributes.

OK

Thanks,
-Takahiro AKASHI

Thanks,
Mark.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help