Thread (17 messages) 17 messages, 5 authors, 2016-12-02

Re: mmap_sem bottleneck

From: Laurent Dufour <hidden>
Date: 2016-10-18 14:50:19

On 17/10/2016 14:51, Peter Zijlstra wrote:
On Mon, Oct 17, 2016 at 02:33:53PM +0200, Laurent Dufour wrote:
quoted
Hi all,

I'm sorry to resurrect this topic, but with the increasing number of
CPUs, this becomes more frequent that the mmap_sem is a bottleneck
especially between the page fault handling and the other threads memory
management calls.

In the case I'm seeing, there is a lot of page fault occurring while
other threads are trying to manipulate the process memory layout through
mmap/munmap.

There is no *real* conflict between these operations, the page fault are
done a different page and areas that the one addressed by the mmap/unmap
operations. Thus threads are dealing with different part of the
process's memory space. However since page fault handlers and mmap/unmap
operations grab the mmap_sem, the page fault handling are serialized
with the mmap operations, which impact the performance on large system.

For the record, the page fault are done while reading data from a file
system, and I/O are really impacted by this serialization when dealing
with a large number of parallel threads, in my case 192 threads (1 per
online CPU). But the source of the page fault doesn't really matter I guess.

I took time trying to figure out how to get rid of this bottleneck, but
this is definitively too complex for me.
I read this mailing history, and some LWN articles about that and my
feeling is that there is no clear way to limit the impact of this
semaphore. Last discussion on this topic seemed to happen last march
during the LSFMM submit (https://lwn.net/Articles/636334/). But this
doesn't seem to have lead to major changes, or may be I missed them.

I'm now seeing that this is a big thing and that it would be hard and
potentially massively intrusive to get rid of this bottleneck, and I'm
wondering what could be to best approach here, RCU, range locks, etc..

Does anyone have an idea ?
If its really just the pagefaults you care about you can have a look at
my speculative page fault stuff that I don't ever seem to get around to
updating :/

Latest version is here:

  https://lkml.kernel.org/r/20141020215633.717315139@infradead.org

Plenty of bits left to sort with that, but the general idea is to use
the split page-table locks (PTLs) as range lock for the mmap_sem.
Thanks Peter for the pointer,

It sounds that some parts of this series are already upstream, like the
use of the fault_env structure, but the rest of the code need some
refresh to apply on the latest kernel. I'll try to update your series
and will give it a try asap.

This being said, I'm wondering if the concern Kirill raised about the
VMA sequence count handling are still valid...

By the way I'm adding Kirill in the loop since I miserably forgot to
include him when sending my initial request. My appologizes, Kirill.

Cheers,
Laurent.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help