On Fri, 2018-07-20 at 10:30 +0200, Peter Zijlstra wrote:
On Thu, Jul 19, 2018 at 10:04:09AM -0700, Andy Lutomirski wrote:
quoted
I added some more arch maintainers. The idea here is that, on x86
at
least, task->active_mm and all its refcounting is pure
overhead. When
a process exits, __mmput() gets called, but the core kernel has a
longstanding "optimization" in which other tasks (kernel threads
and
idle tasks) may have ->active_mm pointing at this mm. This is
nasty,
complicated, and hurts performance on large systems, since it
requires
extra atomic operations whenever a CPU switches between real users
threads and idle/kernel threads.
It's also almost completely worthless on x86 at least, since
__mmput()
frees pagetables, and that operation *already* forces a remote TLB
flush, so we might as well zap all the active_mm references at the
same time.
So I disagree that active_mm is complicated (the code is less than
ideal
but that is actually fixable). And aside from the process exit case,
it
does avoid CR3 writes when switching between user and kernel threads
(which can be far more often than exit if you have longer running
tasks).
Now agreed, recent x86 work has made that less important.
And I of course also agree that not doing those refcount atomics is
better.
It might be cleaner to keep the ->active_mm pointer
in place for now (at least in the first patch), even
on architectures where we end up dropping the refcounting.
That way the code is more similar everywhere, and
we just get rid of the expensive instructions.
Let me try coding this up...
--
All Rights Reversed.