Re: [PATCH v2] mm: alloc_pages_bulk: remove assumption of populating only NULL elements
From: Dave Chinner <david@fromorbit.com>
Date: 2025-03-04 08:18:37
Also in:
kvm, linux-btrfs, linux-mm, linux-nfs, linux-xfs, lkml, virtualization
On Fri, Feb 28, 2025 at 05:44:20PM +0800, Yunsheng Lin wrote:
As mentioned in [1], it seems odd to check NULL elements in
the middle of page bulk allocating, and it seems caller can
do a better job of bulk allocating pages into a whole array
sequentially without checking NULL elements first before
doing the page bulk allocation for most of existing users.
Through analyzing of bulk allocation API used in fs, it
seems that the callers are depending on the assumption of
populating only NULL elements in fs/btrfs/extent_io.c and
net/sunrpc/svc_xprt.c while erofs and btrfs don't, see:
commit 91d6ac1d62c3 ("btrfs: allocate page arrays using bulk page allocator")
commit d6db47e571dc ("erofs: do not use pagepool in z_erofs_gbuf_growsize()")
commit c9fa563072e1 ("xfs: use alloc_pages_bulk_array() for buffers")
commit f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")
Change SUNRPC and btrfs to not depend on the assumption.
Other existing callers seems to be passing all NULL elements
via memset, kzalloc, etc.
Remove assumption of populating only NULL elements and treat
page_array as output parameter like kmem_cache_alloc_bulk().
Remove the above assumption also enable the caller to not
zero the array before calling the page bulk allocating API,
which has about 1~2 ns performance improvement for the test
case of time_bench_page_pool03_slow() for page_pool in a
x86 vm system, this reduces some performance impact of
fixing the DMA API misuse problem in [2], performance
improves from 87.886 ns to 86.429 ns.How much slower did you make btrfs and sunrpc by adding all the defragmenting code there?
1. https://lore.kernel.org/all/bd8c2f5c-464d-44ab-b607-390a87ea4cd5@huawei.com/ (local) 2. https://lore.kernel.org/all/20250212092552.1779679-1-linyunsheng@huawei.com/ (local) CC: Jesper Dangaard Brouer <hawk@kernel.org> CC: Luiz Capitulino <redacted> CC: Mel Gorman <redacted> CC: Dave Chinner <david@fromorbit.com> CC: Chuck Lever <redacted> Signed-off-by: Yunsheng Lin <redacted> Acked-by: Jeff Layton <jlayton@kernel.org> --- V2: 1. Drop RFC tag and rebased on latest linux-next. 2. Fix a compile error for xfs.
And you still haven't tested the code changes to XFS, because this patch is also broken.
quoted hunk ↗ jump to hunk
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 5d560e9073f4..b4e95b2dd0f0 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c@@ -319,16 +319,17 @@ xfs_buf_alloc_pages( * least one extra page. */ for (;;) { - long last = filled; + long alloc; - filled = alloc_pages_bulk(gfp_mask, bp->b_page_count, - bp->b_pages); + alloc = alloc_pages_bulk(gfp_mask, bp->b_page_count - filled, + bp->b_pages + filled); + filled += alloc; if (filled == bp->b_page_count) { XFS_STATS_INC(bp->b_mount, xb_page_found); break; } - if (filled != last) + if (alloc) continue;
alloc_pages_bulk() now returns the number of pages allocated in the array. So if we ask for 4 pages, then get 2, filled is now 2. Then we loop, ask for another 2 pages, get those two pages and it returns 4. Now filled is 6, and we continue. Now we ask alloc_pages_bulk() for -2 pages, which returns 4 pages... Worse behaviour: second time around, no page allocation succeeds so it returns 2 pages. Filled is now 4, which is the number of pages we need, so we break out of the loop with only 2 pages allocated. There's about to be kernel crashes occur..... Once is a mistake, twice is compeltely unacceptable. When XFS stops using alloc_pages_bulk (probably 6.15) I won't care anymore. But until then, please stop trying to change this code. NACK. -Dave. -- Dave Chinner david@fromorbit.com