Re: [PATCH v2] mm: alloc_pages_bulk: remove assumption of populating only NULL elements
From: Yunsheng Lin <hidden>
Date: 2025-03-04 12:04:09
Also in:
kvm, linux-btrfs, linux-mm, linux-nfs, linux-xfs, lkml, virtualization
On 2025/3/4 6:13, Chuck Lever wrote:
On 2/28/25 4:44 AM, Yunsheng Lin wrote:quoted
As mentioned in [1], it seems odd to check NULL elements in the middle of page bulk allocating, and it seems caller can do a better job of bulk allocating pages into a whole array sequentially without checking NULL elements first before doing the page bulk allocation for most of existing users.Sorry, but this still makes a claim without providing any data to back it up. Why can callers "do a better job"?
What I meant "do a better job" is that callers are already keeping track of how many pages have been allocated, and it seems convenient to just pass 'page_array + nr_allocated' and 'nr_pages - nr_allocated' when calling subsequent page bulk alloc API so that NULL checking can be avoided, which seems to be the pattern I see in alloc_pages_bulk_interleave().
quoted
Through analyzing of bulk allocation API used in fs, it seems that the callers are depending on the assumption of populating only NULL elements in fs/btrfs/extent_io.c and net/sunrpc/svc_xprt.c while erofs and btrfs don't, see: commit 91d6ac1d62c3 ("btrfs: allocate page arrays using bulk page allocator") commit d6db47e571dc ("erofs: do not use pagepool in z_erofs_gbuf_growsize()") commit c9fa563072e1 ("xfs: use alloc_pages_bulk_array() for buffers") commit f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator") Change SUNRPC and btrfs to not depend on the assumption. Other existing callers seems to be passing all NULL elements via memset, kzalloc, etc. Remove assumption of populating only NULL elements and treat page_array as output parameter like kmem_cache_alloc_bulk(). Remove the above assumption also enable the caller to not zero the array before calling the page bulk allocating API, which has about 1~2 ns performance improvement for the test case of time_bench_page_pool03_slow() for page_pool in a x86 vm system, this reduces some performance impact of fixing the DMA API misuse problem in [2], performance improves from 87.886 ns to 86.429 ns. 1. https://lore.kernel.org/all/bd8c2f5c-464d-44ab-b607-390a87ea4cd5@huawei.com/ (local) 2. https://lore.kernel.org/all/20250212092552.1779679-1-linyunsheng@huawei.com/ (local) CC: Jesper Dangaard Brouer <hawk@kernel.org> CC: Luiz Capitulino <redacted> CC: Mel Gorman <redacted> CC: Dave Chinner <david@fromorbit.com> CC: Chuck Lever <redacted> Signed-off-by: Yunsheng Lin <redacted> Acked-by: Jeff Layton <jlayton@kernel.org> --- V2: 1. Drop RFC tag and rebased on latest linux-next. 2. Fix a compile error for xfs. 3. Defragmemt the page_array for SUNRPC and btrfs. --- drivers/vfio/pci/virtio/migrate.c | 2 -- fs/btrfs/extent_io.c | 23 +++++++++++++++++----- fs/erofs/zutil.c | 12 ++++++------ fs/xfs/xfs_buf.c | 9 +++++---- mm/page_alloc.c | 32 +++++-------------------------- net/core/page_pool.c | 3 --- net/sunrpc/svc_xprt.c | 22 +++++++++++++++++---- 7 files changed, 52 insertions(+), 51 deletions(-)52:51 is not an improvement. 1-2 ns is barely worth mentioning. The sunrpc and btrfs callers are more complex and carry duplicated code.
Yes, the hard part is to find common file to place the common function
as something as below:
void defragment_pointer_array(void** array, int size) {
int slow = 0;
for (int fast = 0; fast < size; fast++) {
if (array[fast] != NULL) {
swap(&array[fast], &array[slow]);
slow++;
}
}
}
Or introduce a function as something like alloc_pages_refill_array()
for the usecase of sunrpc and xfs and do the array defragment in
alloc_pages_refill_array() before calling alloc_pages_bulk()?
Any suggestion?
Not an outright objection from me, but it's hard to justify this change.
The above should reduce the number to something like 40:51. Also, I looked more closely at other callers calling alloc_pages_bulk(), it seems vm_module_tags_populate() doesn't clear next_page[i] to NULL after __free_page() when doing 'Clean up and error out', I am not sure if vm_module_tags_populate() will be called multi-times as vm_module_tags->pages seems to be only set to all NULL once, see alloc_tag_init() -> alloc_mod_tags_mem(). Cc Suren to see if there is some clarifying for the above.