Thread (73 messages) 73 messages, 7 authors, 2021-02-22

Re: [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas

From: Michal Hocko <mhocko@suse.com>
Date: 2021-02-15 19:21:43
Also in: linux-api, linux-arch, linux-fsdevel, linux-kselftest, linux-mm, linux-riscv, lkml, nvdimm

On Mon 15-02-21 10:14:43, James Bottomley wrote:
On Mon, 2021-02-15 at 10:13 +0100, Michal Hocko wrote:
quoted
On Sun 14-02-21 11:21:02, James Bottomley wrote:
quoted
On Sun, 2021-02-14 at 10:58 +0100, David Hildenbrand wrote:
[...]
quoted
quoted
And here we come to the question "what are the differences that
justify a new system call?" and the answer to this is very
subjective. And as such we can continue bikeshedding forever.
I think this fits into the existing memfd_create() syscall just
fine, and I heard no compelling argument why it shouldn‘t. That‘s
all I can say.
OK, so let's review history.  In the first two incarnations of the
patch, it was an extension of memfd_create().  The specific
objection by Kirill Shutemov was that it doesn't share any code in
common with memfd and so should be a separate system call:

https://lore.kernel.org/linux-api/20200713105812.dnwtdhsuyj3xbh4f@box/ (local)
Thanks for the pointer. But this argument hasn't been challenged at
all. It hasn't been brought up that the overlap would be considerable
higher by the hugetlb/sealing support. And so far nobody has claimed
those combinations as unviable.
Kirill is actually interested in the sealing path for his KVM code so
we took a look.  There might be a two line overlap in memfd_create for
the seal case, but there's no real overlap in memfd_add_seals which is
the bulk of the code.  So the best way would seem to lift the inode ...
-> seals helpers to be non-static so they can be reused and roll our
own add_seals.
These are implementation details which are not really relevant to the
API IMHO. 
I can't see a use case at all for hugetlb support, so it seems to be a
bit of an angels on pin head discussion.  However, if one were to come
along handling it in the same way seems reasonable.
Those angels have made their way to mmap, System V shm, memfd_create and
other MM interfaces which have never envisioned when introduced. Hugetlb
pages to back guest memory is quite a common usecase so why do you think
those guests wouldn't like to see their memory be "secret"?

As I've said in my last response (YCZEGuLK94szKZDf@dhcp22.suse.cz), I am
not going to argue all these again. I have made my point and you are
free to take it or leave it.
quoted
quoted
The other objection raised offlist is that if we do use
memfd_create, then we have to add all the secret memory flags as an
additional ioctl, whereas they can be specified on open if we do a
separate system call.  The container people violently objected to
the ioctl because it can't be properly analysed by seccomp and much
preferred the syscall version.

Since we're dumping the uncached variant, the ioctl problem
disappears but so does the possibility of ever adding it back if we
take on the container peoples' objection.  This argues for a
separate syscall because we can add additional features and extend
the API with flags without causing anti-ioctl riots.
I am sorry but I do not understand this argument.
You don't understand why container guarding technology doesn't like
ioctls?
No, I did not see where the ioctl argument came from.

[...]
quoted
 What kind of flags are we talking about and why would that be a
problem with memfd_create interface? Could you be more specific
please?
You mean what were the ioctl flags in the patch series linked above? 
They were SECRETMEM_EXCLUSIVE and SECRETMEM_UNCACHED in patch 3/5. 
OK I see. How many potential modes are we talking about? A few or
potentially many?
They were eventually dropped after v10, because of problems with
architectural semantics, with the idea that it could be added back
again if a compelling need arose:

https://lore.kernel.org/linux-api/20201123095432.5860-1-rppt@kernel.org/ (local)

In theory the extra flags could be multiplexed into the memfd_create
flags like hugetlbfs is but with 32 flags and a lot already taken it
gets messy for expansion.  When we run out of flags the first question
people will ask is "why didn't you do separate system calls?".
OK, I do not necessarily see a lack of flag space a problem. I can be
wrong here but I do not see how that would be solved by a separate
syscall when it sounds rather forseeable that many modes supported by
memfd_create will eventually find their way to a secret memory as well.
If for no other reason, secret memory is nothing really special. It is
just a memory which is not mapped to the kernel via 1:1 mapping. That's
it. And that can be applied to any memory provided to the userspace.

But I am repeating myself again here so I better stop.
-- 
Michal Hocko
SUSE Labs

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help