Re: [PATCH v7 0/6] arm64: support FEAT_BBM level 2 and large block mapping when rodata=full
From: Yang Shi <hidden>
Date: 2025-09-04 17:48:04
Also in:
linux-mm, lkml
On 9/4/25 6:16 AM, Ryan Roberts wrote:
On 04/09/2025 14:14, Ryan Roberts wrote:quoted
On 03/09/2025 01:50, Yang Shi wrote:quoted
quoted
quoted
quoted
I am wondering whether we can just have a warn_on_once or something for the case when we fail to allocate a pagetable page. Or, Ryan had suggested in an off-the-list conversation that we can maintain a cache of PTE tables for every PMD block mapping, which will give us the same memory consumption as we do today, but not sure if this is worth it. x86 can already handle splitting but due to the callchains I have described above, it has the same problem, and the code has been working for years :)I think it's preferable to avoid having to keep a cache of pgtable memory if we can...Yes, I agree. We simply don't know how many pages we need to cache, and it still can't guarantee 100% allocation success.This is wrong... We can know how many pages will be needed for splitting linear mapping to PTEs for the worst case once linear mapping is finalized. But it may require a few hundred megabytes memory to guarantee allocation success. I don't think it is worth for such rare corner case.Indeed, we know exactly how much memory we need for pgtables to map the linear map by pte - that's exactly what we are doing today. So we _could_ keep a cache. We would still get the benefit of improved performance but we would lose the benefit of reduced memory. I think we need to solve the vm_reset_perms() problem somehow, before we can enable this.Sorry I realise this was not very clear... I am saying I think we need to fix it somehow. A cache would likely work. But I'd prefer to avoid it if we can find a better solution.
Took a deeper look at vm_reset_perms(). It was introduced by commit
868b104d7379 ("mm/vmalloc: Add flag for freeing of special
permsissions"). The VM_FLUSH_RESET_PERMS flag is supposed to be set if
the vmalloc memory is RO and/or ROX. So set_memory_ro() or
set_memory_rox() is supposed to follow up vmalloc(). So the page table
should be already split before reaching vfree(). I think this why
vm_reset_perms() doesn't not check return value.
I scrutinized all the callsites with VM_FLUSH_RESET_PERMS flag set. The
most of them has set_memory_ro() or set_memory_rox() followed. But there
are 3 places I don't see set_memory_ro()/set_memory_rox() is called.
1. BPF trampoline allocation. The BPF trampoline calls
arch_protect_bpf_trampoline(). The generic implementation does call
set_memory_rox(). But the x86 and arm64 implementation just simply
return 0. For x86, it is because execmem cache is used and it does call
set_memory_rox(). ARM64 doesn't need to split page table before this
series, so it should never fail. I think we just need to use the generic
implementation (remove arm64 implementation) if this series is merged.
2. BPF dispatcher. It calls execmem_alloc which has VM_FLUSH_RESET_PERMS
set. But it is used for rw allocation, so VM_FLUSH_RESET_PERMS should be
unnecessary IIUC. So it doesn't matter even though vm_reset_perms() fails.
3. kprobe. S390's alloc_insn_page() does call set_memory_rox(), x86 also
called set_memory_rox() before switching to execmem cache. The execmem
cache calls set_memory_rox(). I don't know why ARM64 doesn't call it.
So I think we just need to fix #1 and #3 per the above analysis. If this
analysis look correct to you guys, I will prepare two patches to fix them.
Thanks,
Yang
quoted
Thanks, Ryanquoted
Thanks, Yangquoted
Thanks, Yangquoted
Thanks, Ryan