Thread (181 messages) 181 messages, 12 authors, 2023-11-22

Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free

From: Liam R. Howlett <hidden>
Date: 2023-01-20 17:10:07
Also in: linux-arm-kernel, linux-mm, lkml

* Matthew Wilcox [off-list ref] [230120 11:50]:
On Fri, Jan 20, 2023 at 08:45:21AM -0800, Suren Baghdasaryan wrote:
quoted
On Fri, Jan 20, 2023 at 8:20 AM Suren Baghdasaryan [off-list ref] wrote:
quoted
On Fri, Jan 20, 2023 at 12:52 AM Michal Hocko [off-list ref] wrote:
quoted
On Thu 19-01-23 10:52:03, Suren Baghdasaryan wrote:
quoted
On Thu, Jan 19, 2023 at 4:59 AM Michal Hocko [off-list ref] wrote:
quoted
On Mon 09-01-23 12:53:34, Suren Baghdasaryan wrote:
quoted
call_rcu() can take a long time when callback offloading is enabled.
Its use in the vm_area_free can cause regressions in the exit path when
multiple VMAs are being freed. To minimize that impact, place VMAs into
a list and free them in groups using one call_rcu() call per group.
After some more clarification I can understand how call_rcu might not be
super happy about thousands of callbacks to be invoked and I do agree
that this is not really optimal.

On the other hand I do not like this solution much either.
VM_AREA_FREE_LIST_MAX is arbitrary and it won't really help all that
much with processes with a huge number of vmas either. It would still be
in housands of callbacks to be scheduled without a good reason.

Instead, are there any other cases than remove_vma that need this
batching? We could easily just link all the vmas into linked list and
use a single call_rcu instead, no? This would both simplify the
implementation, remove the scaling issue as well and we do not have to
argue whether VM_AREA_FREE_LIST_MAX should be epsilon or epsilon + 1.
Yes, I agree the solution is not stellar. I wanted something simple
but this is probably too simple. OTOH keeping all dead vm_area_structs
on the list without hooking up a shrinker (additional complexity) does
not sound too appealing either.
I suspect you have missed my idea. I do not really want to keep the list
around or any shrinker. It is dead simple. Collect all vmas in
remove_vma and then call_rcu the whole list at once after the whole list
(be it from exit_mmap or remove_mt). See?
Yes, I understood your idea but keeping dead objects until the process
exits even when the system is low on memory (no shrinkers attached)
seems too wasteful. If we do this I would advocate for attaching a
shrinker.
Maybe even simpler, since we are hit with this VMA freeing flood
during exit_mmap (when all VMAs are destroyed), we pass a hint to
vm_area_free to batch the destruction and all other cases call
call_rcu()? I don't think there will be other cases of VMA destruction
floods.
... or have two different call_rcu functions; one for munmap() and
one for exit.  It'd be nice to use kmem_cache_free_bulk().
Do we even need a call_rcu on exit?  At the point of freeing the VMAs we
have set the MMF_OOM_SKIP bit and unmapped the vmas under the read lock.
Once we have obtained the write lock again, I think it's safe to say we
can just go ahead and free the VMAs directly.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help