Re: [RFC] NUMA balancing: reduce TLB flush via delaying mapping on hint page fault

From: Nadav Amit <hidden>
Date: 2021-03-31 16:37:10
Also in: lkml

On Mar 31, 2021, at 6:16 AM, Mel Gorman [off-list ref] wrote:

On Wed, Mar 31, 2021 at 07:20:09PM +0800, Huang, Ying wrote:

quoted

Mel Gorman [off-list ref] writes:

quoted

On Mon, Mar 29, 2021 at 02:26:51PM +0800, Huang Ying wrote:

quoted

For NUMA balancing, in hint page fault handler, the faulting page will
be migrated to the accessing node if necessary.  During the migration,
TLB will be shot down on all CPUs that the process has run on
recently.  Because in the hint page fault handler, the PTE will be
made accessible before the migration is tried.  The overhead of TLB
shooting down is high, so it's better to be avoided if possible.  In
fact, if we delay mapping the page in PTE until migration, that can be
avoided.  This is what this patch doing.

Why would the overhead be high? It was previously inaccessibly so it's
only parallel accesses making forward progress that trigger the need
for a flush.

Sorry, I don't understand this.  Although the page is inaccessible, the
threads may access other pages, so TLB flushing is still necessary.

You assert the overhead of TLB shootdown is high and yes, it can be
very high but you also said "the benchmark score has no visible changes"
indicating the TLB shootdown cost is not a major problem for the workload.
It does not mean we should ignore it though.

If you are looking for a benchmark that is negatively affected by NUMA
balancing, then IIRC Parsec’s dedup is such a workload. [1]

[1] https://parsec.cs.princeton.edu/

Attachments

signature.asc [application/pgp-signature] 833 bytes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help