Thread (30 messages) 30 messages, 6 authors, 2026-02-03

Re: [PATCH net-next 2/3] mm: vmalloc: export find_vm_area()

From: "D. Wythe" <alibuda@linux.alibaba.com >
Date: 2026-01-29 11:03:26
Also in: linux-mm, linux-rdma, linux-s390, lkml

On Wed, Jan 28, 2026 at 03:49:34PM +0200, Leon Romanovsky wrote:
On Wed, Jan 28, 2026 at 08:44:04PM +0800, D. Wythe wrote:
quoted
On Wed, Jan 28, 2026 at 01:13:46PM +0200, Leon Romanovsky wrote:
quoted
On Wed, Jan 28, 2026 at 11:45:58AM +0800, D. Wythe wrote:
quoted
On Tue, Jan 27, 2026 at 03:34:17PM +0200, Leon Romanovsky wrote:
quoted
On Sat, Jan 24, 2026 at 10:57:54PM +0800, D. Wythe wrote:
quoted
On Sat, Jan 24, 2026 at 11:48:59AM +0100, Uladzislau Rezki wrote:
quoted
Hello, D. Wythe!
quoted
On Fri, Jan 23, 2026 at 07:55:17PM +0100, Uladzislau Rezki wrote:
quoted
On Fri, Jan 23, 2026 at 04:23:48PM +0800, D. Wythe wrote:
quoted
find_vm_area() provides a way to find the vm_struct associated with a
virtual address. Export this symbol to modules so that modularized
subsystems can perform lookups on vmalloc addresses.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 mm/vmalloc.c | 1 +
 1 file changed, 1 insertion(+)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ecbac900c35f..3eb9fe761c34 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3292,6 +3292,7 @@ struct vm_struct *find_vm_area(const void *addr)
 
 	return va->vm;
 }
+EXPORT_SYMBOL_GPL(find_vm_area);
 
This is internal. We can not just export it.

--
Uladzislau Rezki
Hi Uladzislau,

Thank you for the feedback. I agree that we should avoid exposing
internal implementation details like struct vm_struct to external
subsystems.

Following Christoph's suggestion, I'm planning to encapsulate the page
order lookup into a minimal helper instead:

unsigned int vmalloc_page_order(const void *addr){
	struct vm_struct *vm;
 	vm = find_vm_area(addr);
	return vm ? vm->page_order : 0;
}
EXPORT_SYMBOL_GPL(vmalloc_page_order);

Does this approach look reasonable to you? It would keep the vm_struct
layout private while satisfying the optimization needs of SMC.
Could you please clarify why you need info about page_order? I have not
looked at your second patch.

Thanks!

--
Uladzislau Rezki
Hi Uladzislau,

This stems from optimizing memory registration in SMC-R. To provide the
RDMA hardware with direct access to memory buffers, we must register
them with the NIC. During this process, the hardware generates one MTT
entry for each physically contiguous block. Since these hardware entries
are a finite and scarce resource, and SMC currently defaults to a 4KB
registration granularity, a single 2MB buffer consumes 512 entries. In
high-concurrency scenarios, this inefficiency quickly exhausts NIC
resources and becomes a major bottleneck for system scalability.
I believe this complexity can be avoided by using the RDMA MR pool API,
as other ULPs do, for example NVMe.

Thanks
Hi Leon,

Am I correct in assuming you are suggesting mr_pool to limit the number
of MRs as a way to cap MTTE consumption?
I don't see this a limit, but something that is considered standard
practice to reduce MTT consumption.
quoted
However, our goal is to maximize the total registered memory within
the MTTE limits rather than to cap it. In SMC-R, each connection
occupies a configurable, fixed-size registered buffer; consequently,
the more memory we can register, the more concurrent connections
we can support.
It is not cap, but more efficient use of existing resources.
Got it. While MRs pool might be more standard practice, but it doesn't
address our specific bottleneck. In fact, smc already has its own internal
MR reuse; our core issue remains reducing MTTE consumption by increasing the
registration granularity to maximize the memory size mapped per MTT entry.
And this is something MR pools can handle as well. We are going in circles,
so let's summarize.
I believe some points need to be thoroughly clarified here:
I see SMC‑R as one of the RDMA ULPs, and it should ideally rely on the
existing ULP API used by NVMe, NFS, and others, rather than maintaining its
own internal logic.
SMC is not opposed to adopting newer RDMA interfaces; in fact, I have
already planned a gradual migration to the updated RDMA APIs. We are
currently in the process of adapting to ib_cqe, for instance. As long as
functionality remains intact, there is no reason to oppose changes that
reduce maintenance overhead or provide additional gains, but such a
transition takes time.
I also do not know whether vmalloc_page_order() is an appropriate solution;
I only want to show that we can probably achieve the same result without
introducing a new function.
Regarding the specific issue under discussion, I believe the newer RDMA
APIs you mentioned do not solve my problem, at least for now. My
understanding is that regardless of how MRs are pooled, the core
requirement is to increase the page_size parameter in ib_map_mr_sg to
maximize the physical size mapped per MTTE. From the code I have
examined, I see no evidence of these new APIs utilizing values other
than 4KB.

Of course, I believe that regardless of whether this issue
currently exists, it is something the RDMA community can resolve.
However, as I mentioned, adapting to new API takes time. Before a
complete transition is achieved, we need to allow for some necessary
updates to SMC.

Thanks
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help