Thread (11 messages) 11 messages, 3 authors, 2022-08-10

Re: [PATCH 0/5] Add process_memwatch syscall

From: Muhammad Usama Anjum <hidden>
Date: 2022-08-10 16:45:39
Also in: linux-api, linux-doc, linux-fsdevel, linux-kselftest, linux-perf-users, lkml

On 8/10/22 2:22 PM, Peter.Enderborg@sony.com wrote:
On 7/26/22 18:18, Muhammad Usama Anjum wrote:
quoted
Hello,

This patch series implements a new syscall, process_memwatch. Currently,
only the support to watch soft-dirty PTE bit is added. This syscall is
generic to watch the memory of the process. There is enough room to add
more operations like this to watch memory in the future.

Soft-dirty PTE bit of the memory pages can be viewed by using pagemap
procfs file. The soft-dirty PTE bit for the memory in a process can be
cleared by writing to the clear_refs file. This series adds features that
weren't possible through the Proc FS interface.
- There is no atomic get soft-dirty PTE bit status and clear operation
  possible.
- The soft-dirty PTE bit of only a part of memory cannot be cleared.

Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The Proc FS interface is enough for that as I think the process
is frozen. We have the use case where we need to track the soft-dirty
PTE bit for running processes. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows. This syscall is used by games to keep
track of dirty pages and keep processing only the dirty pages. This
syscall can be used by the CRIU project and other applications which
require soft-dirty PTE bit information.

As in the current kernel there is no way to clear a part of memory (instead
of clearing the Soft-Dirty bits for the entire processi) and get+clear
operation cannot be performed atomically, there are other methods to mimic
this information entirely in userspace with poor performance:
- The mprotect syscall and SIGSEGV handler for bookkeeping
- The userfaultfd syscall with the handler for bookkeeping

        long process_memwatch(int pidfd, unsigned long start, int len,
                              unsigned int flags, void *vec, int vec_len);

This syscall can be used by the CRIU project and other applications which
require soft-dirty PTE bit information. The following operations are
supported in this syscall:
- Get the pages that are soft-dirty.
- Clear the pages which are soft-dirty.
- The optional flag to ignore the VM_SOFTDIRTY and only track per page
soft-dirty PTE bit
Why can it not be done as a IOCTL?
It can be done as ioctl. I think this syscall can be used in future for
adding other operations like soft-dirty. This is why syscall has been added.

-- 
Muhammad Usama Anjum
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help