Thread (80 messages) 80 messages, 12 authors, 2021-10-15

Re: [PATCH v10 3/3] mm: add anonymous vma name refcounting

From: Suren Baghdasaryan <surenb@google.com>
Date: 2021-10-15 18:34:02
Also in: linux-doc, linux-fsdevel, lkml

On Fri, Oct 15, 2021 at 9:39 AM David Hildenbrand [off-list ref] wrote:
quoted
quoted
quoted
1. Forking a process with anonymous vmas named using memfd is 5-15%
slower than with prctl (depends on the number of VMAs in the process
being forked). Profiling shows that i_mmap_lock_write() dominates
dup_mmap(). Exit path is also slower by roughly 9% with
free_pgtables() and fput() dominating exit_mmap(). Fork performance is
important for Android because almost all processes are forked from
zygote, therefore this limitation already makes this approach
prohibitive.
Interesting, naturally I wonder if that can be optimized.
Maybe but it looks like we simply do additional things for file-backed
memory, which seems natural. The call to i_mmap_lock_write() is from
here: https://elixir.bootlin.com/linux/latest/source/kernel/fork.c#L565
quoted
quoted
2. mremap() usage to grow the mapping has an issue when used with memfds:

fd = memfd_create(name, MFD_ALLOW_SEALING);
ftruncate(fd, size_bytes);
ptr = mmap(NULL, size_bytes, prot, MAP_PRIVATE, fd, 0);
close(fd);
ptr = mremap(ptr, size_bytes, size_bytes * 2, MREMAP_MAYMOVE);
touch_mem(ptr, size_bytes * 2);

This would generate a SIGBUS in touch_mem(). I believe it's because
ftruncate() specified the size to be size_bytes and we are accessing
more than that after remapping. prctl() does not have this limitation
and we do have a usecase for growing a named VMA.
Can't you simply size the memfd much larger? I mean, it doesn't really
cost much, does it?
If we know beforehand what the max size it can reach then that would
be possible. I would really hate to miscalculate here and cause a
simple memory access to generate signals. Tracking such corner cases
in the field is not an easy task and I would rather avoid the
possibility of it.
The question would be if you cannot simply add some extremely large
number, because the file size itself doesn't really matter for memfd IIRC.

Having that said, without trying it out, I wouldn't know from the top of
my head if memremap would work that way on an already closed fd that ahs
a sufficient size :/ If you have the example still somewhere, I would be
interested if that would work in general.
Yes, I tried a simple test like this and it works:

fd = memfd_create(name, MFD_ALLOW_SEALING);
ftruncate(fd, size_bytes * 2);
ptr = mmap(NULL, size_bytes, prot, MAP_PRIVATE, fd, 0);
close(fd);
ptr = mremap(ptr, size_bytes, size_bytes * 2, MREMAP_MAYMOVE);
touch_mem(ptr, size_bytes * 2);

I understand your suggestion but it's just another hoop we have to
jump to make this work and feels unnatural from userspace POV. Also
virtual address space exhaustion might be an issue for 32bit userspace
with this approach.
[...]
quoted
quoted
quoted
4. There is a usecase in the Android userspace where vma naming
happens after memory was allocated. Bionic linker does in-memory
relocations and then names some relocated sections.
Would renaming a memfd be an option or is that "too late" ?
My understanding is that linker allocates space to load and relocate
the code, performs the relocations in that space and then names some
of the regions after that. Whether it can be redesigned to allocate
multiple named regions and perform the relocation between them I did
not really try since it would be a project by itself.

TBH, at some point I just look at the amount of required changes (both
kernel and userspace) and new limitations that userspace has to adhere
to for fitting memfds to my usecase, and I feel that it's just not
worth it. In the end we end up using the same refcounted strings with
vma->vm_file->f_count as the refcount and name stored in
vma->vm_file->f_path->dentry but with more overhead.
Yes, but it's glued to files which naturally have names :)
Yeah, I understand your motivations and that's why I'm exploring these
possibilities but it proves to be just too costly for a feature as
simple as naming a vma :)
Again, I appreciate that you looked into alternatives! I can see the
late renaming could be the biggest blocker if user space cannot be
adjusted easily to be compatible with that using memfds.
Yeah, it would definitely be hard for Android to adopt this.

If there are no objections to the current approach I would like to
respin another version with the CONFIG option added sometime early
next week. If anyone has objections, please let me know.
Thanks,
Suren.
--
Thanks,

David / dhildenb
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help