Re: [PATCH v8 3/4] mm/madvise: introduce process_madvise() syscall: an external memory hinting API
From: Minchan Kim <minchan@kernel.org>
Date: 2020-08-28 19:04:07
Also in:
linux-man, linux-mm, lkml
On Fri, Aug 28, 2020 at 08:25:34PM +0200, Christian Brauner wrote:
On Fri, Aug 28, 2020 at 8:24 PM Jens Axboe [off-list ref] wrote:quoted
On 8/28/20 11:40 AM, Arnd Bergmann wrote:quoted
On Mon, Jun 22, 2020 at 9:29 PM Minchan Kim [off-list ref] wrote:quoted
So finally, the API is as follows, ssize_t process_madvise(int pidfd, const struct iovec *iovec, unsigned long vlen, int advice, unsigned int flags);I had not followed the discussion earlier and only now came across the syscall in linux-next, sorry for stirring things up this late.quoted
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 94bf4958d114..8f959d90338a 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl@@ -364,6 +364,7 @@ 440 common watch_mount sys_watch_mount 441 common watch_sb sys_watch_sb 442 common fsinfo sys_fsinfo +443 64 process_madvise sys_process_madvise # # x32-specific system call numbers start at 512 to avoid cache impact@@ -407,3 +408,4 @@ 545 x32 execveat compat_sys_execveat 546 x32 preadv2 compat_sys_preadv64v2 547 x32 pwritev2 compat_sys_pwritev64v2 +548 x32 process_madvise compat_sys_process_madviseI think we should not add any new x32-specific syscalls. Instead I think the compat_sys_process_madvise/sys_process_madvise can be merged into one.quoted
+ mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS); + if (IS_ERR_OR_NULL(mm)) { + ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH; + goto release_task; + }Minor point: Having to use IS_ERR_OR_NULL() tends to be fragile, and I would try to avoid that. Can mm_access() be changed to itself return PTR_ERR(-ESRCH) instead of NULL to improve its calling conventions? I see there are only three other callers.quoted
+ ret = import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter); + if (ret >= 0) { + ret = do_process_madvise(pidfd, &iter, behavior, flags); + kfree(iov); + } + return ret; +} + +#ifdef CONFIG_COMPAT...quoted
+ + ret = compat_import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), + &iov, &iter); + if (ret >= 0) { + ret = do_process_madvise(pidfd, &iter, behavior, flags); + kfree(iov); + }Every syscall that passes an iovec seems to do this. If we make import_iovec() handle both cases directly, this syscall and a number of others can be simplified, and you avoid the x32 entry point I mentioned above Something like (untested) index dad8d0cfaaf7..0de4ddff24c1 100644--- a/lib/iov_iter.c +++ b/lib/iov_iter.c@@ -1683,8 +1683,13 @@ ssize_t import_iovec(int type, const structiovec __user * uvector, { ssize_t n; struct iovec *p; - n = rw_copy_check_uvector(type, uvector, nr_segs, fast_segs, - *iov, &p); + + if (in_compat_syscall())I suggested the exact same solutions roughly 1.5 weeks ago. :) Fun when I saw you mentioning this in BBB I knew exactly what you were referring too. :)
https://lore.kernel.org/linux-man/20200816081227.ngw3l45c5uncesmr@wittgenstein/ (local) Yes, Christian suggested the idea but mostly for only this new syscall. I don't have the time to revise the patchset yet but may have next week. I will follow Christian's suggestion. Thanks.