Thread (48 messages) 48 messages, 4 authors, 2025-09-17

Re: [PATCH v7 0/6] arm64: support FEAT_BBM level 2 and large block mapping when rodata=full

From: Yang Shi <hidden>
Date: 2025-09-04 17:48:04
Also in: linux-mm, lkml


On 9/4/25 6:16 AM, Ryan Roberts wrote:
On 04/09/2025 14:14, Ryan Roberts wrote:
quoted
On 03/09/2025 01:50, Yang Shi wrote:
quoted
quoted
quoted
quoted
I am wondering whether we can just have a warn_on_once or something for the
case
when we fail to allocate a pagetable page. Or, Ryan had
suggested in an off-the-list conversation that we can maintain a cache of PTE
tables for every PMD block mapping, which will give us
the same memory consumption as we do today, but not sure if this is worth it.
x86 can already handle splitting but due to the callchains
I have described above, it has the same problem, and the code has been working
for years :)
I think it's preferable to avoid having to keep a cache of pgtable memory if we
can...
Yes, I agree. We simply don't know how many pages we need to cache, and it
still can't guarantee 100% allocation success.
This is wrong... We can know how many pages will be needed for splitting linear
mapping to PTEs for the worst case once linear mapping is finalized. But it may
require a few hundred megabytes memory to guarantee allocation success. I don't
think it is worth for such rare corner case.
Indeed, we know exactly how much memory we need for pgtables to map the linear
map by pte - that's exactly what we are doing today. So we _could_ keep a cache.
We would still get the benefit of improved performance but we would lose the
benefit of reduced memory.

I think we need to solve the vm_reset_perms() problem somehow, before we can
enable this.
Sorry I realise this was not very clear... I am saying I think we need to fix it
somehow. A cache would likely work. But I'd prefer to avoid it if we can find a
better solution.
Took a deeper look at vm_reset_perms(). It was introduced by commit 
868b104d7379 ("mm/vmalloc: Add flag for freeing of special 
permsissions"). The VM_FLUSH_RESET_PERMS flag is supposed to be set if 
the vmalloc memory is RO and/or ROX. So set_memory_ro() or 
set_memory_rox() is supposed to follow up vmalloc(). So the page table 
should be already split before reaching vfree(). I think this why 
vm_reset_perms() doesn't not check return value.

I scrutinized all the callsites with VM_FLUSH_RESET_PERMS flag set. The 
most of them has set_memory_ro() or set_memory_rox() followed. But there 
are 3 places I don't see set_memory_ro()/set_memory_rox() is called.

1. BPF trampoline allocation. The BPF trampoline calls 
arch_protect_bpf_trampoline(). The generic implementation does call 
set_memory_rox(). But the x86 and arm64 implementation just simply 
return 0. For x86, it is because execmem cache is used and it does call 
set_memory_rox(). ARM64 doesn't need to split page table before this 
series, so it should never fail. I think we just need to use the generic 
implementation (remove arm64 implementation) if this series is merged.

2. BPF dispatcher. It calls execmem_alloc which has VM_FLUSH_RESET_PERMS 
set. But it is used for rw allocation, so VM_FLUSH_RESET_PERMS should be 
unnecessary IIUC. So it doesn't matter even though vm_reset_perms() fails.

3. kprobe. S390's alloc_insn_page() does call set_memory_rox(), x86 also 
called set_memory_rox() before switching to execmem cache. The execmem 
cache calls set_memory_rox(). I don't know why ARM64 doesn't call it.

So I think we just need to fix #1 and #3 per the above analysis. If this 
analysis look correct to you guys, I will prepare two patches to fix them.

Thanks,
Yang
quoted
Thanks,
Ryan
quoted
Thanks,
Yang
quoted
Thanks,
Yang
quoted
Thanks,
Ryan
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help