Thread (10 messages) 10 messages, 4 authors, 2025-02-21

Re: [RFC] mm: alloc_pages_bulk: remove assumption of populating only NULL elements

From: Yunsheng Lin <hidden>
Date: 2025-02-19 11:20:13
Also in: kvm, linux-btrfs, linux-mm, linux-nfs, linux-xfs, lkml, virtualization

On 2025/2/19 5:14, Dave Chinner wrote:
On Tue, Feb 18, 2025 at 05:21:27PM +0800, Yunsheng Lin wrote:
quoted
On 2025/2/18 5:31, Dave Chinner wrote:

...
quoted
.....
quoted
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 15bb790359f8..9e1ce0ab9c35 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -377,16 +377,17 @@ xfs_buf_alloc_pages(
 	 * least one extra page.
 	 */
 	for (;;) {
-		long	last = filled;
+		long	alloc;
 
-		filled = alloc_pages_bulk(gfp_mask, bp->b_page_count,
-					  bp->b_pages);
+		alloc = alloc_pages_bulk(gfp_mask, bp->b_page_count - refill,
+					 bp->b_pages + refill);
+		refill += alloc;
 		if (filled == bp->b_page_count) {
 			XFS_STATS_INC(bp->b_mount, xb_page_found);
 			break;
 		}
 
-		if (filled != last)
+		if (alloc)
 			continue;
You didn't even compile this code - refill is not defined
anywhere.

Even if it did complile, you clearly didn't test it. The logic is
broken (what updates filled?) and will result in the first
allocation attempt succeeding and then falling into an endless retry
loop.
Ah, the 'refill' is a typo, it should be 'filled' instead of 'refill'.
The below should fix the compile error:
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -379,9 +379,9 @@ xfs_buf_alloc_pages(
        for (;;) {
                long    alloc;

-               alloc = alloc_pages_bulk(gfp_mask, bp->b_page_count - refill,
-                                        bp->b_pages + refill);
-               refill += alloc;
+               alloc = alloc_pages_bulk(gfp_mask, bp->b_page_count - filled,
+                                        bp->b_pages + filled);
+               filled += alloc;
                if (filled == bp->b_page_count) {
                        XFS_STATS_INC(bp->b_mount, xb_page_found);
                        break;
quoted
i.e. you stepped on the API landmine of your own creation where
it is impossible to tell the difference between alloc_pages_bulk()
returning "memory allocation failed, you need to retry" and
it returning "array is full, nothing more to allocate". Both these
cases now return 0.
As my understanding, alloc_pages_bulk() will not be called when
"array is full" as the above 'filled == bp->b_page_count' checking
has ensured that if the array is not passed in with holes in the
middle for xfs.
You miss the point entirely. Previously, alloc_pages_bulk() would
return a value that would tell us the array is full, even if we
call it with a full array to begin with.

Now it fails to tell us that the array is full, and we have to track
that precisely ourselves - it is impossible to tell the difference
between "array is full" and "allocation failed". Not being able to
determine from the allocation return value whether the array is
ready for use or whether another go-around to fill it is needed is a
very poor API choice, regardless of anything else.

You've already demonstrated this: tracking array usage in every
caller is error-prone and much harder to get right than just having
alloc_pages_bulk() do everything for us.
While I am agreed that it might be hard to track array usage in every
caller to see if removing assumption of populating only NULL elements
cause problem for them, I still think the page bulk alloc API before
this patch have some space for improvement from performance and
easy-to-use perspective as the most existing calllers of page bulk
alloc API are trying to bulk allocate the page for the whole array
sequentially.
quoted
quoted
The existing code returns nr_populated in both cases, so it doesn't
matter why alloc_pages_bulk() returns with nr_populated != full, it
is very clear that we still need to allocate more memory to fill it.
I am not sure if the array will be passed in with holes in the
middle for the xfs fs as mentioned above, if not, it seems to be
a typical use case like the one in mempolicy.c as below:

https://elixir.bootlin.com/linux/v6.14-rc1/source/mm/mempolicy.c#L2525
That's not "typical" usage. That is implementing "try alloc" fast
path that avoids memory reclaim with a slow path fallback to fill
the rest of the array when the fast path fails.

No other users of alloc_pages_bulk() is trying to do this.
What I meant by "typical" usage is the 'page_array + nr_allocated'
trick that avoids the NULL checking when page bulk allocation API
is used in mm/mempolicy.c, most of existing callers for page bulk
allocation in other places seems likely to be changed to do the
similar trick as this patch does.
Indeed, it looks somewhat pointless to do this here (i.e. premature
optimisation!), because the only caller of
alloc_pages_bulk_mempolicy_noprof() has it's own fallback slowpath
for when alloc_pages_bulk() can't fill the entire request.
quoted
quoted
IOWs, you just demonstrated why the existing API is more desirable
than a highly constrained, slightly faster API that requires callers
to get every detail right. i.e. it's hard to get it wrong with the
existing API, yet it's so easy to make mistakes with the proposed
API that the patch proposing the change has serious bugs in it.
IMHO, if the API is about refilling pages for the only NULL elements,
it seems better to add a API like refill_pages_bulk() for that, as
the current API seems to be prone to error of not initializing the
array to zero before calling alloc_pages_bulk().
How is requiring a well defined initial state for API parameters
"error prone"?  What code is failing to do the well known, defined
initialisation before calling alloc_pages_bulk()?

Allowing uninitialised structures in an API (i.e. unknown initial
conditions) means we cannot make assumptions about the structure
contents within the API implementation.  We cannot assume that all
variables are zero on the first use, nor can we assume that anything
that is zero has a valid state.
It seems the above is the main differenece we see from the API perspective,
as I see the array as output parameter and you seems to treat the array as
both input and output parameter?

The kmem_cache_alloc_bulk() API related API seems to treat the array as
output parameter too as this patch does, the difference from this patch
is that if there is no enough memory, it will free the allocated memory
and return 0 to the caller while this patch returns already allocated
memory to its caller even when there is no enough memory.
Again, this is poor API design - structures passed to interfaces
-should- have a well defined initial state, either set by a *_init()
function or by defining the initial state to be all zeros (i.e. via
memset, kzalloc, etc).

Performance and speed is not an excuse for writing fragile, easy to
break code and APIs.

-Dave.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help