Thread (80 messages) 80 messages, 12 authors, 2021-10-15

Re: [PATCH v10 3/3] mm: add anonymous vma name refcounting

From: David Hildenbrand <hidden>
Date: 2021-10-15 08:04:22
Also in: linux-doc, linux-fsdevel, lkml

On 14.10.21 22:16, Suren Baghdasaryan wrote:
On Tue, Oct 12, 2021 at 10:01 AM Suren Baghdasaryan [off-list ref] wrote:
quoted
On Tue, Oct 12, 2021 at 12:44 AM David Hildenbrand [off-list ref] wrote:
quoted
quoted
I'm still evaluating the proposal to use memfds but I'm not sure if
the issue that David Hildenbrand mentioned about additional memory
consumed in pagecache (which has to be addressed) is the only one we
will encounter with this approach. If anyone knows of any potential
issues with using memfds as named anonymous memory, I would really
appreciate your feedback before I go too far in that direction.
[MAP_PRIVATE memfd only behave that way with 4k, not with huge pages, so
I think it just has to be fixed. It doesn't make any sense to allocate a
page for the pagecache ("populate the file") when accessing via a
private mapping that's supposed to leave the file untouched]

My gut feeling is if you really need a string as identifier, then try
going with memfds. Yes, we might hit some road blocks to be sorted out,
but it just logically makes sense to me: Files have names. These names
exist before mapping and after mapping. They "name" the content.
I'm investigating this direction. I don't have much background with
memfds, so I'll need to digest the code first.
I've done some investigation into the possibility of using memfds to
name anonymous VMAs. Here are my findings:
Thanks for exploring the alternatives!
1. Forking a process with anonymous vmas named using memfd is 5-15%
slower than with prctl (depends on the number of VMAs in the process
being forked). Profiling shows that i_mmap_lock_write() dominates
dup_mmap(). Exit path is also slower by roughly 9% with
free_pgtables() and fput() dominating exit_mmap(). Fork performance is
important for Android because almost all processes are forked from
zygote, therefore this limitation already makes this approach
prohibitive.
Interesting, naturally I wonder if that can be optimized.
2. mremap() usage to grow the mapping has an issue when used with memfds:

fd = memfd_create(name, MFD_ALLOW_SEALING);
ftruncate(fd, size_bytes);
ptr = mmap(NULL, size_bytes, prot, MAP_PRIVATE, fd, 0);
close(fd);
ptr = mremap(ptr, size_bytes, size_bytes * 2, MREMAP_MAYMOVE);
touch_mem(ptr, size_bytes * 2);

This would generate a SIGBUS in touch_mem(). I believe it's because
ftruncate() specified the size to be size_bytes and we are accessing
more than that after remapping. prctl() does not have this limitation
and we do have a usecase for growing a named VMA.
Can't you simply size the memfd much larger? I mean, it doesn't really
cost much, does it?
3. Leaves an fd exposed, even briefly, which may lead to unexpected
flaws (e.g. anything using mmap MAP_SHARED could allow exposures or
overwrites). Even MAP_PRIVATE, if an attacker writes into the file
after ftruncate() and before mmap(), can cause private memory to be
initialized with unexpected data.
I don't quite follow. Can you elaborate what exactly the issue here is?
We use a temporary fd, yes, but how is that a problem?

Any attacker can just write any random memory memory in the address
space, so I don't see the issue.
4. There is a usecase in the Android userspace where vma naming
happens after memory was allocated. Bionic linker does in-memory
relocations and then names some relocated sections.
Would renaming a memfd be an option or is that "too late" ?
In the light of these findings, could the current patchset be reconsidered?
Thanks,
Suren.

-- 
Thanks,

David / dhildenb

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help