Re: [Bug report] hash_name() may cross page boundary and trigger sleep in RCU context
From: Zizhi Wo <hidden>
Date: 2025-11-29 01:02:30
Also in:
linux-fsdevel, linux-mm, lkml
在 2025/11/28 20:25, Will Deacon 写道:
On Fri, Nov 28, 2025 at 09:39:45AM +0800, Zizhi Wo wrote:quoted
在 2025/11/28 9:18, Zizhi Wo 写道:quoted
在 2025/11/28 9:17, Zizhi Wo 写道:quoted
在 2025/11/27 20:59, Will Deacon 写道:quoted
On Wed, Nov 26, 2025 at 05:05:05PM +0800, Zizhi Wo wrote:quoted
We're running into the following issue on an ARM32 platform with the linux 5.10 kernel: [<c0300b78>] (__dabt_svc) from [<c0529cb8>] (link_path_walk.part.7+0x108/0x45c) [<c0529cb8>] (link_path_walk.part.7) from [<c052a948>] (path_openat+0xc4/0x10ec) [<c052a948>] (path_openat) from [<c052cf90>] (do_filp_open+0x9c/0x114) [<c052cf90>] (do_filp_open) from [<c0511e4c>] (do_sys_openat2+0x418/0x528) [<c0511e4c>] (do_sys_openat2) from [<c0513d98>] (do_sys_open+0x88/0xe4) [<c0513d98>] (do_sys_open) from [<c03000c0>] (ret_fast_syscall+0x0/0x58) ... [<c0315e34>] (unwind_backtrace) from [<c030f2b0>] (show_stack+0x20/0x24) [<c030f2b0>] (show_stack) from [<c14239f4>] (dump_stack+0xd8/0xf8) [<c14239f4>] (dump_stack) from [<c038d188>] (___might_sleep+0x19c/0x1e4) [<c038d188>] (___might_sleep) from [<c031b6fc>] (do_page_fault+0x2f8/0x51c) [<c031b6fc>] (do_page_fault) from [<c031bb44>] (do_DataAbort+0x90/0x118) [<c031bb44>] (do_DataAbort) from [<c0300b78>] (__dabt_svc+0x58/0x80) ... During the execution of hash_name()->load_unaligned_zeropad(), a potential memory access beyond the PAGE boundary may occur. For example, when the filename length is near the PAGE_SIZE boundary. This triggers a page fault, which leads to a call to do_page_fault()->mmap_read_trylock(). If we can't acquire the lock, we have to fall back to the mmap_read_lock() path, which calls might_sleep(). This breaks RCU semantics because path lookup occurs under an RCU read-side critical section. In linux-mainline, arm/arm64 do_page_fault() still has this problem: lock_mm_and_find_vma->get_mmap_lock_carefully->mmap_read_lock_killable. And before commit bfcfaa77bdf0 ("vfs: use 'unsigned long' accesses for dcache name comparison and hashing"), hash_name accessed the name byte by byte. To prevent load_unaligned_zeropad() from accessing beyond the valid memory region, we would need to intercept such cases beforehand? But doing so would require replicating the internal logic of load_unaligned_zeropad(), including handling endianness and constructing the correct value manually. Given that load_unaligned_zeropad() is used in many places across the kernel, we currently haven't found a good solution to address this cleanly. What would be the recommended way to handle this situation? Would appreciate any feedback and guidance from the community. Thanks!Does it help if you bodge the translation fault handler along the lines of the untested diff below?I tried it out and it works — thank you for the solution you provided.Thanks for giving it a spin.quoted
At the same time, since I’m a beginner in this area, I’d like to ask a question. The comment above do_translation_fault() says: “We enter here because the first level page table doesn't contain a valid entry for the address.” However, after modifying the code, it seems that when encountering FSR_FS_INVALID_PAGE, the kernel no longer creates a page table entry, but instead directly jumps to bad_area.FSR_FS_INVALID_PAGE indicates a last level translation fault (that's the "page" part) so it's only applicable in the case where the other levels of page-table have been populated already. I wondered about checking !is_vmalloc_addr() too, but I couldn't convince myself that load_unaligned_zeropad() is only ever used with the linear map.
Thank you very much for the answer. For the vmalloc area, I checked the call points on the vfs side, such as dentry_string_cmp() or hash_name(). Their "names addr" are all assigned by kmalloc(), so there should be no corresponding issues. But I'm not familiar with the other calling points...
quoted
I'd like to ask — could this change potentially cause any other side effects?There's always the possibility but I personally think it's more self-contained than the other patches doing the rounds. For example, I don't make any changes to the permission fault handling path. Will
Ok. Thank you for your explanation. Thanks, Zizhi Wo