Thread (21 messages) 21 messages, 6 authors, 2023-09-19

Re: [RFC PATCH 3/3] mm/migrate: Create move_phys_pages syscall

From: "Arnd Bergmann" <arnd@arndb.de>
Date: 2023-09-11 22:50:22
Also in: linux-arch, linux-cxl, lkml

On Sun, Sep 10, 2023, at 14:52, Gregory Price wrote:
On Sat, Sep 09, 2023 at 05:18:13PM +0200, Arnd Bergmann wrote:
quoted
I think a pointer to '__u64' is the most appropriate here,
that is compatible between 32-bit and 64-bit architectures
and covers all addresses until we get architectures with
128-bit addressing.

Thinking about it more, I noticed an existing bug in
both sys_move_pages and your current version of
sys_move_phys_pages: the 'pages' array is in fact not
suitable for compat tasks and needs an is_compat_task
check to load a 32-bit address from compat userspace on
the "if (get_user(p, pages + i))" line.
I'll clean up the current implementation for what I have on a v2 of an
RFC, and then look at adding some pull-ahead patches to fix both
move_pages and move_phys_pages for compat processes.  Might take me a
bit, I've only done compat work once before and I remember it being
annoying to get right.
I think what you want is roughly this (untested):
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2159,6 +2159,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
                         const int __user *nodes,
                         int __user *status, int flags)
 {
+       struct compat_uptr_t __user *compat_pages = (void __user *)pages;
        int current_node = NUMA_NO_NODE;
        LIST_HEAD(pagelist);
        int start, i;
@@ -2171,8 +2172,17 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
                int node;
 
                err = -EFAULT;
-               if (get_user(p, pages + i))
-                       goto out_flush;
+               if (in_compat_syscall() {
+                       compat_uptr_t cp;
+
+                       if (get_user(cp, compat_pages + i))
+                               goto out_flush;
+
+                       p = compat_ptr(cp);
+               } else {
+                       if (get_user(p, pages + i))
+                               goto out_flush;
+               }
                if (get_user(node, nodes + i))
                        goto out_flush;
 
alternatively you could use the get_compat_pages_array()
helper that is already used in the do_pages_stat()
function.
I did see other work on migrate.c hanging around on the list, I'll
double check this hasn't already been discovered/handled.
It looks like I broke it, and it was working before my own
5b1b561ba73c8 ("mm: simplify compat_sys_move_pages"), which
only handled the nodes=NULL path.

I suppose nobody noticed the regression because there are very
few 32-bit NUMA systems, and even fewer cases in which one
would run compat userspace to manage a 64-bit NUMA machine.
This only requires plumbing new 2 flags through do_pages_move, and no
new user-exposed types or information.

Is there an ick-factor with the idea of adding the following?

MPOL_MF_PHYS_ADDR : Treat page migration addresses as physical
MPOL_MF_PFN : Treat page migration addresses as PFNs
I would strongly prefer supporting only one of the two, and
a 64-bit physical address seems like the logical choice here.
quoted
These do not seem to be problematic from the ASLR perspective, so
I guess it may still be useful without CAP_SYS_ADMIN.
After reviewing the capabilities documentation it seems like
CAP_SYS_NICE is the appropriate capability.  My last meassage I said
CAP_SYS_ADMIN was probably correct, but I think using CAP_SYS_NICE
is more appropriate unless there are strong feelings due to the use of
PFN and Physcall Address.

I'm not sure rowhammer is of great concern in this interface because you
can't choose the destination address, only the destination node. Though
I suppose someone could go nuts and try to "massage" a node in some way
to get a statistical likelihood of placement (similar heap grooming).
I agree that this doesn't introduce any additional risk for rowhammer
attacks, but it seems slightly more logical to me to use CAP_SYS_ADMIN
if that is what the other interfaces use that handle physical addresses
and may leak address information.

     Arnd
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help