Re: per-cpu thoughts | linux-arm-kernel

(off-list ancestor, not in this archive)

On Tue, 12 Mar 2019, Paul Walmsley wrote:

Similar cases apply within SLUB, and I'd hoped to improve that with my
this-cpu-reg branch, but I didn't see a measureable improvement on
workloads I tried.
That certainly suggests that all of this could be much to-do about
nothing, or at least very little.  One observation is that some of the
performance concerns that Christoph is expressing here may be about
ensuring predictable and minimal latency bounds, rather than raw
throughput.
The performance concerns are mainly when scaling RISC V to many cores
which will create contention for counter handling. The scalable counter
system (ZVCs) was developed to address these issues and later the this
cpu operations where optimizing that performance.

At this point on RISC V with just a couple of cores you may not see too
much of an effect. In fact a UP system would be running faster if it does
not use these schemes since there is no contention. Scalability of counter
operations becomes a challenge as core counts increase.

OK.  I have been assuming that the risk of a scheduler call in
preempt_enable() is what Christoph is concerned about here:

https://lore.kernel.org/linux-riscv/b0653f7a6f1bc0c9329d37de690d3bed@mailhost.ics.forth.gr/T/#m6e609e26a9e5405c4a7e2dbd5ca8c969cada5c36 (local)

If is possible to eliminate the latency risk from a 'simple' counter
increment/decrement by creating a restricted API, that may be worthwhile.

Christoph has also been concerned that the AMO operations will carry an
unacceptable performance overhead.  But the RISC-V AMO operations can be
written such that they don't have the ordering restrictions that the Intel
LOCK-prefixed operations do, and thus those concerns may not apply -- at
least not to the same extent.  Perhaps this is also true for the ARM LSE
atomics.
My main concern at this point is to ensure that RISC V has the proper
setup for the future and that decision are made that scaling up of RISC V
to hundreds of cores (if not more) does not become a bottleneck. One of
the use cases here for us is likely to have extreme parallel operations
for HPC style compute.

The main issue for the core VM is to limit the overhead of the statics and
counter operations. Introducing atomic operations in key fault paths has
caused performance regressions in the past.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help