Thread (23 messages) 23 messages, 7 authors, 2025-03-12

Re: [PATCH v2] mm: alloc_pages_bulk: remove assumption of populating only NULL elements

From: NeilBrown <hidden>
Date: 2025-03-07 21:03:12
Also in: kvm, linux-btrfs, linux-mm, linux-nfs, linux-xfs, lkml, virtualization

On Fri, 07 Mar 2025, Yunsheng Lin wrote:
On 2025/3/7 5:14, NeilBrown wrote:
quoted
On Thu, 06 Mar 2025, Yunsheng Lin wrote:
quoted
On 2025/3/6 7:41, NeilBrown wrote:
quoted
On Wed, 05 Mar 2025, Yunsheng Lin wrote:
quoted
For the existing btrfs and sunrpc case, I am agreed that there
might be valid use cases too, we just need to discuss how to
meet the requirements of different use cases using simpler, more
unified and effective APIs.
We don't need "more unified".
What I meant about 'more unified' is how to avoid duplicated code as
much as possible for two different interfaces with similar‌ functionality.

The best way I tried to avoid duplicated code as much as possible is
to defragment the page_array before calling the alloc_pages_bulk()
for the use case of btrfs and sunrpc so that alloc_pages_bulk() can
be removed of the assumption populating only NULL elements, so that
the API is simpler and more efficient.
quoted
If there are genuinely two different use cases with clearly different
needs - even if only slightly different - then it is acceptable to have
two different interfaces.  Be sure to choose names which emphasise the
differences.
The best name I can come up with for the use case of btrfs and sunrpc
is something like alloc_pages_bulk_refill(), any better suggestion about
the naming?
I think alloc_pages_bulk_refill() is a good name.

So:
- alloc_pages_bulk() would be given an uninitialised array of page
  pointers and a required count and would return the number of pages
  that were allocated
- alloc_pages_bulk_refill() would be given an initialised array of page
  pointers some of which might be NULL.  It would attempt to allocate
  pages for the non-NULL pointers and return the total number of
You meant 'NULL pointers' instead of 'non-NULL pointers' above?
Correct - thanks.
quoted
  allocated pages in the array - just like the current
  alloc_pages_bulk().
I guess 'the total number of allocated pages in the array ' include
the pages which are already in the array before calling the above
API?
Yes - just what the current function does.
Though I don't know that we really need that detail.
I think there are three interesting return values:

- hard failure - don't bother trying again soon:   maybe -ENOMEM
- success - all pages are allocated:  maybe 0 (or 1?)
- partial success - at least one page allocated, ok to try again
  immediately - maybe -EAGAIN (or 0).
I guess it is worth mentioning that the current alloc_pages_bulk()
may return different value with the same size of arrays, but with
different layout of the same number of NULL pointers.
For the same size of arrays with different layout for the NULL pointer
below('*' indicate NULL pointer), and suppose buddy allocator is only
able to allocate two pages:
1. P**P*P: will return 4.
2. P*PP**: will return 5.
I guess the documentation for the function is wrong then.
They lends weight to the suggestion that the current return value isn't
ideal.
If the new API do the page defragmentation, then it will always return
the same value for different layout of the same number of NULL pointers.
I guess the new one is the more perfered behavior as it provides a more
defined semantic.
quoted
sunrpc could usefully use both of these interfaces.

alloc_pages_bulk() could be implemented by initialising the array and
then calling alloc_pages_bulk_refill().  Or alloc_pages_bulk_refill()
could be implemented by compacting the pages and then calling
alloc_pages_bulk().
If we could duplicate the code and have two similar but different
functions.

The documentation for _refill() should make it clear that the pages
might get re-ordered.
Does 'the pages might get re-ordered' mean defragmenting the page_array?
If yes, it makes sense to make it clear.
Correct.  Defragmentation is an option for implementing refill.  We
shouldn't promise to do that, but we should define the API in such a way
that it is allowed.
quoted
Having looked at some of the callers I agree that the current interface
is not ideal for many of them, and that providing a simpler interface
would help.
+1
quoted
Thanks,
NeilBrown
If I were do work on this (and I'm not, so you don't have to follow my
ideas) I would separate the bulk_alloc into several inline functions and
combine them into the different interfaces that you want.  This will
result in duplicated object code without duplicated source code.  The
object code should be optimal.

The parts of the function are:
 - validity checks - fallback to single page allocation
 - select zone - fallback to single page allocation
 - allocate multiple pages in the zone and dispose of them
 - allocate a single page

The "dispose of them" is one of
  - add to a linked list
  - add to end of array
  - add to next hole in array

These three could be inline functions that the "allocate multiple pages"
and "allocate single page" functions call.  We can pass these as
function arguments and the compile will inline them.
I imagine these little function would take one page and return
a bool indicating if any more are wanted.

The three functions: alloc_bulk_array alloc_bulk_list
alloc_bulk_refill_array would each look like:

  validity checks: do we need to allocate anything?

  if want more than one page &&
     am allowed to do mulipage (e.g. not __GFP_ACCOUNT) &&
     zone = choose_zone() {
        alloc_multi_from_zone(zone, dispose_function)
  }
  if nothing allocated
     alloc_single_page(dispose_function)

Each would have a different dispose_function and the initial checks
would be quite different, as would the return value.

Thanks for working on this.

NeilBrown
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help