Thread (25 messages) 25 messages, 2 authors, 2024-08-28

Re: [PATCH net-next v15 08/13] mm: page_frag: use __alloc_pages() to replace alloc_pages_node()

From: Alexander Duyck <hidden>
Date: 2024-08-27 15:31:34
Also in: linux-mm, lkml

On Tue, Aug 27, 2024 at 5:07 AM Yunsheng Lin [off-list ref] wrote:
quoted hunk ↗ jump to hunk
On 2024/8/27 1:00, Alexander Duyck wrote:
quoted
On Mon, Aug 26, 2024 at 5:46 AM Yunsheng Lin [off-list ref] wrote:
quoted
It seems there is about 24Bytes binary size increase for
__page_frag_cache_refill() after refactoring in arm64 system
with 64K PAGE_SIZE. By doing the gdb disassembling, It seems
we can have more than 100Bytes decrease for the binary size
by using __alloc_pages() to replace alloc_pages_node(), as
there seems to be some unnecessary checking for nid being
NUMA_NO_NODE, especially when page_frag is part of the mm
system.

CC: Alexander Duyck <redacted>
Signed-off-by: Yunsheng Lin <redacted>
---
 mm/page_frag_cache.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
index bba59c87d478..e0ad3de11249 100644
--- a/mm/page_frag_cache.c
+++ b/mm/page_frag_cache.c
@@ -28,11 +28,11 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
 #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
        gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) |  __GFP_COMP |
                   __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
-       page = alloc_pages_node(NUMA_NO_NODE, gfp_mask,
-                               PAGE_FRAG_CACHE_MAX_ORDER);
+       page = __alloc_pages(gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER,
+                            numa_mem_id(), NULL);
 #endif
        if (unlikely(!page)) {
-               page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
+               page = __alloc_pages(gfp, 0, numa_mem_id(), NULL);
                if (unlikely(!page)) {
                        nc->encoded_page = 0;
                        return NULL;
I still think this would be better served by fixing alloc_pages_node
to drop the superfluous checks rather than changing the function. We
would get more gain by just addressing the builtin constant and
NUMA_NO_NODE case there.
I am supposing by 'just addressing the builtin constant and NUMA_NO_NODE
case', it meant the below change from the previous discussion:
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 01a49be7c98d..009ffb50d8cd 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -290,6 +290,9 @@ struct folio *__folio_alloc_node_noprof(gfp_t gfp, unsigned int order, int nid)
 static inline struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask,
                                                   unsigned int order)
 {
+       if (__builtin_constant_p(nid) && nid == NUMA_NO_NODE)
+               return __alloc_pages_noprof(gfp_mask, order, numa_mem_id(), NULL);
+
        if (nid == NUMA_NO_NODE)
                nid = numa_mem_id();

Actually it does not seem to get more gain by judging from binary size
changing as below, vmlinux.org is the image after this patchset, and
vmlinux is the image after this patchset with this patch reverted and
with above change applied.

[linyunsheng@localhost net-next]$ ./scripts/bloat-o-meter vmlinux.org vmlinux
add/remove: 0/2 grow/shrink: 16/12 up/down: 432/-340 (92)
Function                                     old     new   delta
new_slab                                     808    1124    +316
its_probe_one                               2860    2908     +48
...
alloc_slab_page                              284       -    -284
Total: Before=30534822, After=30534914, chg +0.00%
Well considering that alloc_slab_page was marked to be "inline" as per
the qualifier applied to it I would say the shrinking had an effect,
but it was just enough to enable the "inline" qualifier to kick in. It
could be argued that the change exposed another issue in that the
alloc_slab_page function is probably large enough that it should just
be "static" and not "static inline". If you can provide you config I
could probably look into this further but I suspect just dropping the
inline for that one function should result in net savings.

The only other big change I see is in its_probe_one which I am not
sure why it would be impacted since it is not passing a constant in
the first place, it is passing its->numa_node. I'd be curious what the
disassembly shows in terms of the change that caused it to increase in
size.

Otherwise the rest of the size changes seem more like code shifts than
anything else likely due to the functions shifting around slightly due
to a few dropping in size.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help