Re: [RFC v3 7/7] shm: isolate pinned pages when sealing files

[PATCH v3 0/7] File Sealing & memfd_create() · David Herrmann <hidden> · 2014-06-13
[PATCH v3 1/7] mm: allow drivers to prevent new writable mappings · David Herrmann <hidden> · 2014-06-13
Re: [PATCH v3 1/7] mm: allow drivers to prevent new writable mappings · Hugh Dickins <hughd@google.com> · 2014-07-09
Re: [PATCH v3 1/7] mm: allow drivers to prevent new writable mappings · David Herrmann <hidden> · 2014-07-19
[PATCH v3 2/7] shm: add sealing API · David Herrmann <hidden> · 2014-06-13
Re: [PATCH v3 2/7] shm: add sealing API · Hugh Dickins <hughd@google.com> · 2014-07-16
Re: [PATCH v3 2/7] shm: add sealing API · David Herrmann <hidden> · 2014-07-19
[PATCH v3 3/7] shm: add memfd_create() syscall · David Herrmann <hidden> · 2014-06-13
Re: [PATCH v3 3/7] shm: add memfd_create() syscall · Michael Kerrisk (man-pages) <hidden> · 2014-06-13
Re: [PATCH v3 3/7] shm: add memfd_create() syscall · David Herrmann <hidden> · 2014-06-13
Re: [PATCH v3 3/7] shm: add memfd_create() syscall · Michael Kerrisk (man-pages) <hidden> · 2014-06-13
Re: [PATCH v3 3/7] shm: add memfd_create() syscall · David Herrmann <hidden> · 2014-07-08
Re: [PATCH v3 3/7] shm: add memfd_create() syscall · Jann Horn <hidden> · 2014-06-15
Re: [PATCH v3 3/7] shm: add memfd_create() syscall · Hugh Dickins <hughd@google.com> · 2014-07-16
Re: [PATCH v3 3/7] shm: add memfd_create() syscall · David Herrmann <hidden> · 2014-07-19
[PATCH v3 4/7] selftests: add memfd_create() + sealing tests · David Herrmann <hidden> · 2014-06-13
Re: [PATCH v3 4/7] selftests: add memfd_create() + sealing tests · Hugh Dickins <hughd@google.com> · 2014-07-16
Re: [PATCH v3 4/7] selftests: add memfd_create() + sealing tests · David Herrmann <hidden> · 2014-07-19
[PATCH v3 5/7] selftests: add memfd/sealing page-pinning tests · David Herrmann <hidden> · 2014-06-13
Re: [PATCH v3 5/7] selftests: add memfd/sealing page-pinning tests · Hugh Dickins <hughd@google.com> · 2014-07-16
Re: [PATCH v3 5/7] selftests: add memfd/sealing page-pinning tests · David Herrmann <hidden> · 2014-07-19
[RFC v3 6/7] shm: wait for pins to be released when sealing · David Herrmann <hidden> · 2014-06-13
Re: [RFC v3 6/7] shm: wait for pins to be released when sealing · Hugh Dickins <hughd@google.com> · 2014-07-16
Re: [RFC v3 6/7] shm: wait for pins to be released when sealing · David Herrmann <hidden> · 2014-07-19
[RFC v3 7/7] shm: isolate pinned pages when sealing files · David Herrmann <hidden> · 2014-06-13
Re: [RFC v3 7/7] shm: isolate pinned pages when sealing files · Andy Lutomirski <luto@amacapital.net> · 2014-06-13
Re: [RFC v3 7/7] shm: isolate pinned pages when sealing files · David Herrmann <hidden> · 2014-06-13
Re: [RFC v3 7/7] shm: isolate pinned pages when sealing files · Andy Lutomirski <luto@amacapital.net> · 2014-06-13
Re: [RFC v3 7/7] shm: isolate pinned pages when sealing files · Hugh Dickins <hughd@google.com> · 2014-07-09
Re: [RFC v3 7/7] shm: isolate pinned pages when sealing files · David Herrmann <hidden> · 2014-07-19
Re: [PATCH v3 0/7] File Sealing & memfd_create() · Andy Lutomirski <luto@amacapital.net> · 2014-06-13
Re: [PATCH v3 0/7] File Sealing & memfd_create() · David Herrmann <hidden> · 2014-07-08
Re: [PATCH v3 0/7] File Sealing & memfd_create() · Hugh Dickins <hughd@google.com> · 2014-07-09

From: David Herrmann <hidden>
Date: 2014-06-13 15:27:14
Also in: linux-fsdevel, linux-mm, lkml

Hi

On Fri, Jun 13, 2014 at 5:06 PM, Andy Lutomirski [off-list ref] wrote:

On Fri, Jun 13, 2014 at 3:36 AM, David Herrmann [off-list ref] wrote:

quoted

When setting SEAL_WRITE, we must make sure nobody has a writable reference
to the pages (via GUP or similar). We currently check references and wait
some time for them to be dropped. This, however, might fail for several
reasons, including:
 - the page is pinned for longer than we wait
 - while we wait, someone takes an already pinned page for read-access

Therefore, this patch introduces page-isolation. When sealing a file with
SEAL_WRITE, we copy all pages that have an elevated ref-count. The newpage
is put in place atomically, the old page is detached and left alone. It
will get reclaimed once the last external user dropped it.

Signed-off-by: David Herrmann <redacted>

Won't this have unexpected effects?

Thread 1:  start read into mapping backed by fd

Thread 2:  SEAL_WRITE

Thread 1: read finishes.  now the page doesn't match the sealed page

Just to be clear: you're talking about read() calls that write into
the memfd? (like my FUSE example does) Your language might be
ambiguous to others as "read into" actually implies a write.

No, this does not have unexpected effects. But yes, your conclusion is
right. To be clear, this behavior would be part of the API. Any
asynchronous write might be cut off by SEAL_WRITE _iff_ you unmap your
buffer before the write finishes. But you actually have to extend your
example:

Thread 1: p = mmap(memfd, SIZE);
Thread 1: h = async_read(some_fd, p, SIZE);
Thread 1: munmap(p, SIZE);
Thread 2: SEAL_WRITE
Thread 1: async_wait(h);

If you don't do the unmap(), then SEAL_WRITE will fail due to an
elevated i_mmap_writable. I think this is fine. In fact, I remember
reading that async-IO is not required to resolve user-space addresses
at the time of the syscall, but might delay it to the time of the
actual write. But you're right, it would be misleading that the AIO
operation returns success. This would be part of the memfd-API,
though. And if you mess with your address space while running an
async-IO operation on it, you're screwed anyway.

Btw., your sealing use-case is really odd. No-one guarantees that the
SEAL_WRITE happens _after_ you schedule your async-read. In case you
have some synchronization there, you just have to move it after
waiting for your async-io to finish.

Does that clear things up?
Thanks
David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help