Re: [RFC PATCH 1/2] mm/filemap: Retry fault by VMA lock if the lock was released for I/O
From: Barry Song <hidden>
Date: 2025-11-27 11:39:23
Also in:
linux-fsdevel, linux-mm, linux-riscv, linux-s390, lkml, loongarch
On Thu, Nov 27, 2025 at 6:52 PM Pedro Falcato [off-list ref] wrote:
On Thu, Nov 27, 2025 at 09:14:37AM +0800, Barry Song wrote:quoted
From: Oven Liyang <redacted> If the current page fault is using the per-VMA lock, and we only released the lock to wait for I/O completion (e.g., using folio_lock()), then when the fault is retried after the I/O completes, it should still qualify for the per-VMA-lock path.<snip>quoted
Signed-off-by: Oven Liyang <redacted> Signed-off-by: Barry Song <redacted> --- arch/arm/mm/fault.c | 5 +++++ arch/arm64/mm/fault.c | 5 +++++ arch/loongarch/mm/fault.c | 4 ++++ arch/powerpc/mm/fault.c | 5 ++++- arch/riscv/mm/fault.c | 4 ++++ arch/s390/mm/fault.c | 4 ++++ arch/x86/mm/fault.c | 4 ++++If only we could unify all these paths :(
Right, it’s a pain, but we do have bots for that? And it’s basically just copy-and-paste across different architectures.
quoted
include/linux/mm_types.h | 9 +++++---- mm/filemap.c | 5 ++++- 9 files changed, 39 insertions(+), 6 deletions(-)diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index b71625378ce3..12b2d65ef1b9 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h@@ -1670,10 +1670,11 @@ enum vm_fault_reason { VM_FAULT_NOPAGE = (__force vm_fault_t)0x000100, VM_FAULT_LOCKED = (__force vm_fault_t)0x000200, VM_FAULT_RETRY = (__force vm_fault_t)0x000400, - VM_FAULT_FALLBACK = (__force vm_fault_t)0x000800, - VM_FAULT_DONE_COW = (__force vm_fault_t)0x001000, - VM_FAULT_NEEDDSYNC = (__force vm_fault_t)0x002000, - VM_FAULT_COMPLETED = (__force vm_fault_t)0x004000, + VM_FAULT_RETRY_VMA = (__force vm_fault_t)0x000800,So, what I am wondering here is why we need one more fault flag versus just blindly doing this on a plain-old RETRY. Is there any particular reason why? I can't think of one.
Because in some cases we retry simply due to needing to take mmap_lock.
For example:
/**
* __vmf_anon_prepare - Prepare to handle an anonymous fault.
* @vmf: The vm_fault descriptor passed from the fault handler.
*
* When preparing to insert an anonymous page into a VMA from a
* fault handler, call this function rather than anon_vma_prepare().
* If this vma does not already have an associated anon_vma and we are
* only protected by the per-VMA lock, the caller must retry with the
* mmap_lock held. __anon_vma_prepare() will look at adjacent VMAs to
* determine if this VMA can share its anon_vma, and that's not safe to
* do with only the per-VMA lock held for this VMA.
*
* Return: 0 if fault handling can proceed. Any other value should be
* returned to the caller.
*/
vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf)
{
...
}
Thus, we have to check each branch one by one, but I/O wait is the most
frequent path, so we handle it first.
I would also like to see performance numbers.
Yes. From what I understand, this patchset should improve performance in a fairly straightforward way. But yes, I can certainly include some data in v2.
The rest of the patch looks OK to me.
Thanks Barry