Thread (30 messages) 30 messages, 6 authors, 2026-02-03

Re: [PATCH net-next 2/3] mm: vmalloc: export find_vm_area()

From: "D. Wythe" <alibuda@linux.alibaba.com >
Date: 2026-02-03 09:15:07
Also in: linux-mm, linux-rdma, linux-s390, lkml

On Fri, Jan 30, 2026 at 11:16:36AM -0400, Jason Gunthorpe wrote:
On Fri, Jan 30, 2026 at 04:51:31PM +0800, D. Wythe wrote:
quoted
On Thu, Jan 29, 2026 at 09:20:58AM -0400, Jason Gunthorpe wrote:
quoted
On Thu, Jan 29, 2026 at 07:36:09PM +0800, D. Wythe wrote:
quoted
quoted
From there you can check the resulting scatterlist and compute the
page_size to pass to ib_map_mr_sg().
I should clarify this is done after DMA mapping the scatterlist. dma
mapping can improve the page size.

And maybe the core code should be helping compute the MR's target page
size for a scatterlist.. We already have code to do this in umem, and
it is a pretty bit tricky considering the IOVA related rules.
Hi Jason,

After a deep dive into ib_umem_find_best_pgsz(), I have to say it is
much more subtle than it first appears. The IOVA-to-PA relative offset
rules, in particular, make it quite easy to get wrong.

While SMC could duplicate this logic, it is certainly not ideal for
maintenance. Are there any plans to refactor this into a generic RDMA
core helper—for instance, one that can determine the best page size
directly from an sg_table or scatterlist?
I have not heard of anyone touching this.

It looks like there are only two users in the kernel that pass
something other than PAGE_SIZE, so it seems nobody has cared about
this till now.

With high order folios being more common it seems like something
missing.

However, I wonder what the drivers do with the input page size, 
segmenting a scatterlist is a bit hard and we have helpers for that
already too.

It is a bigger project but probably the right thing is to remove the
page size input, wrap the scatterlist in a umem and fixup the drivers
to use the existing umem support for building mtts, splitting
scatterlists into blocks and so on.

The kernel side here has been left alone for a long time..
I am also curious about the original design intent behind requiring the 
caller to explicitly pass `page_size`. From what I can see, its primary 
role is to define the memory size per MTTE, but calculating the optimal 
value is surprisingly complex.

I completely agree that providing an automatic way to optimize or 
calculate the best page size should be the responsibility of the drivers
or the RDMA core themselves. Handling such low-level hardware-related 
details in a ULP like SMC feels misplaced.

Since it appears this isn't a high-priority issue for the community at
the moment, and a proper fix requires a much larger architectural effort 
in the RDMA core, I will withdraw this patch series. 

I'll keep an eye on the RDMA subsystem's progress and see if a more 
generic solution emerges in the future.

Thanks,
D. Wythe

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help