Thread (11 messages) 11 messages, 5 authors, 4d ago

Re: [RFC 0/4] mm, swap: Enable THP SWAP for PowerPC Book3S64

From: YoungJun Park <youngjun.park@lge.com>
Date: 2026-06-09 15:54:45
Also in: linux-mm, lkml

On Tue, Jun 09, 2026 at 06:49:30PM +0530, Ritesh Harjani (IBM) wrote:
On PowerPC Book3S64, MMU is selected at runtime, so macros like PMD_SHIFT are
effectively runtime variables in the Book3S64 code. THP swap code uses these
macros for e.g. to size some of its array data structures based on PMD_ORDER.
This patch series makes that usage dependent on the runtime variable.

Sayali did some performance runs of this on Book3S64 with Radix and it gives
40-50% performance improvement. We also plan to run it with Hash, will soon
update the results.

Note that this patch series is based out of linux-next (next-20260608).

Ritesh Harjani (IBM) (4):
  include/linux/swap.h: Remove unused leftovers
  mm, swap: make SWAPFILE_CLUSTER runtime
  mm, swap: make SWAP_NR_ORDERS runtime
  powerpc: Kconfig: Enable THP_SWAP on Book3S64

 arch/powerpc/platforms/Kconfig.cputype |   1 +
 include/linux/swap.h                   |  17 +---
 mm/swap.h                              |   5 +-
 mm/swap_table.h                        |   6 +-
 mm/swapfile.c                          | 132 ++++++++++++++++++-------
 5 files changed, 106 insertions(+), 55 deletions(-)

--
2.39.5
Hello!

Instead of making SWAP_NR_ORDERS fully runtime, could we set it to the max
PMD_ORDER possible on PowerPC Book3S64 as a compile-time constant in the
swap.h ifdef block? (My assumtion is PMD_ORDER max not too big.)

I think the general runtime version adds cost. It impacts all other archs.
percpu_swap_cluster needs a runtime alloc,
the si/offset and nonfull/frag arrays become separate pointers, and some
accesses get one more indirection. And for nr_orders=1, the allocation
itself is just waste. 

With a compile-time possible max constant, the only downside is some acceptable amount of
wasted bytes per CPU / per device on Book3S64 (the unused entries in the swap
offset cache and the nonfull/frag lists), with no perf impact. the perf
improvement comes from THP swap itself, right? Other arches see no
impact at all.

patch 2 looks fine as is. SWAPFILE_CLUSTER backs much bigger per-cluster
arrays, so runtime sizing makes sense there, and it looks like no impact to
other arches or the current code.

Thanks!
Youngjun Park
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help