Re: [PATCH net-next 2/3] mm: vmalloc: export find_vm_area()
From: "D. Wythe" <alibuda@linux.alibaba.com >
Date: 2026-02-03 09:15:07
Also in:
linux-mm, linux-rdma, linux-s390, lkml
On Fri, Jan 30, 2026 at 11:16:36AM -0400, Jason Gunthorpe wrote:
On Fri, Jan 30, 2026 at 04:51:31PM +0800, D. Wythe wrote:quoted
On Thu, Jan 29, 2026 at 09:20:58AM -0400, Jason Gunthorpe wrote:quoted
On Thu, Jan 29, 2026 at 07:36:09PM +0800, D. Wythe wrote:quoted
quoted
From there you can check the resulting scatterlist and compute the page_size to pass to ib_map_mr_sg().I should clarify this is done after DMA mapping the scatterlist. dma mapping can improve the page size. And maybe the core code should be helping compute the MR's target page size for a scatterlist.. We already have code to do this in umem, and it is a pretty bit tricky considering the IOVA related rules.Hi Jason, After a deep dive into ib_umem_find_best_pgsz(), I have to say it is much more subtle than it first appears. The IOVA-to-PA relative offset rules, in particular, make it quite easy to get wrong. While SMC could duplicate this logic, it is certainly not ideal for maintenance. Are there any plans to refactor this into a generic RDMA core helper—for instance, one that can determine the best page size directly from an sg_table or scatterlist?I have not heard of anyone touching this. It looks like there are only two users in the kernel that pass something other than PAGE_SIZE, so it seems nobody has cared about this till now. With high order folios being more common it seems like something missing. However, I wonder what the drivers do with the input page size, segmenting a scatterlist is a bit hard and we have helpers for that already too. It is a bigger project but probably the right thing is to remove the page size input, wrap the scatterlist in a umem and fixup the drivers to use the existing umem support for building mtts, splitting scatterlists into blocks and so on. The kernel side here has been left alone for a long time..
I am also curious about the original design intent behind requiring the caller to explicitly pass `page_size`. From what I can see, its primary role is to define the memory size per MTTE, but calculating the optimal value is surprisingly complex. I completely agree that providing an automatic way to optimize or calculate the best page size should be the responsibility of the drivers or the RDMA core themselves. Handling such low-level hardware-related details in a ULP like SMC feels misplaced. Since it appears this isn't a high-priority issue for the community at the moment, and a proper fix requires a much larger architectural effort in the RDMA core, I will withdraw this patch series. I'll keep an eye on the RDMA subsystem's progress and see if a more generic solution emerges in the future. Thanks, D. Wythe