Thread (21 messages) 21 messages, 7 authors, 2020-08-29

Re: [PATCH v8 3/4] mm/madvise: introduce process_madvise() syscall: an external memory hinting API

From: Minchan Kim <minchan@kernel.org>
Date: 2020-08-28 19:04:07
Also in: linux-man, linux-mm, lkml

On Fri, Aug 28, 2020 at 08:25:34PM +0200, Christian Brauner wrote:
On Fri, Aug 28, 2020 at 8:24 PM Jens Axboe [off-list ref] wrote:
quoted
On 8/28/20 11:40 AM, Arnd Bergmann wrote:
quoted
On Mon, Jun 22, 2020 at 9:29 PM Minchan Kim [off-list ref] wrote:
quoted
So finally, the API is as follows,

     ssize_t process_madvise(int pidfd, const struct iovec *iovec,
               unsigned long vlen, int advice, unsigned int flags);
I had not followed the discussion earlier and only now came across
the syscall in linux-next, sorry for stirring things up this late.
quoted
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 94bf4958d114..8f959d90338a 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -364,6 +364,7 @@
 440    common  watch_mount             sys_watch_mount
 441    common  watch_sb                sys_watch_sb
 442    common  fsinfo                  sys_fsinfo
+443    64      process_madvise         sys_process_madvise

 #
 # x32-specific system call numbers start at 512 to avoid cache impact
@@ -407,3 +408,4 @@
 545    x32     execveat                compat_sys_execveat
 546    x32     preadv2                 compat_sys_preadv64v2
 547    x32     pwritev2                compat_sys_pwritev64v2
+548    x32     process_madvise         compat_sys_process_madvise
I think we should not add any new x32-specific syscalls. Instead I think
the compat_sys_process_madvise/sys_process_madvise can be
merged into one.
quoted
+       mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
+       if (IS_ERR_OR_NULL(mm)) {
+               ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
+               goto release_task;
+       }
Minor point: Having to use IS_ERR_OR_NULL() tends to be fragile,
and I would try to avoid that. Can mm_access() be changed to
itself return PTR_ERR(-ESRCH) instead of NULL to improve its
calling conventions? I see there are only three other callers.

quoted
+       ret = import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter);
+       if (ret >= 0) {
+               ret = do_process_madvise(pidfd, &iter, behavior, flags);
+               kfree(iov);
+       }
+       return ret;
+}
+
+#ifdef CONFIG_COMPAT
...
quoted
+
+       ret = compat_import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack),
+                               &iov, &iter);
+       if (ret >= 0) {
+               ret = do_process_madvise(pidfd, &iter, behavior, flags);
+               kfree(iov);
+       }
Every syscall that passes an iovec seems to do this. If we make import_iovec()
handle both cases directly, this syscall and a number of others can
be simplified, and you avoid the x32 entry point I mentioned above

Something like (untested)

index dad8d0cfaaf7..0de4ddff24c1 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1683,8 +1683,13 @@ ssize_t import_iovec(int type, const struct
iovec __user * uvector,
 {
        ssize_t n;
        struct iovec *p;
-       n = rw_copy_check_uvector(type, uvector, nr_segs, fast_segs,
-                                 *iov, &p);
+
+       if (in_compat_syscall())
I suggested the exact same solutions roughly 1.5 weeks ago. :)
Fun when I saw you mentioning this in BBB I knew exactly what you were
referring too. :)
https://lore.kernel.org/linux-man/20200816081227.ngw3l45c5uncesmr@wittgenstein/ (local)

Yes, Christian suggested the idea but mostly for only this new syscall.
I don't have the time to revise the patchset yet but may have next week.
I will follow Christian's suggestion.

Thanks.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help