Re: per-cpu thoughts
From: Christopher Lameter <hidden>
Date: 2019-03-12 17:34:22
Also in:
linux-riscv
On Tue, 12 Mar 2019, Paul Walmsley wrote:
quoted
Similar cases apply within SLUB, and I'd hoped to improve that with my this-cpu-reg branch, but I didn't see a measureable improvement on workloads I tried.That certainly suggests that all of this could be much to-do about nothing, or at least very little. One observation is that some of the performance concerns that Christoph is expressing here may be about ensuring predictable and minimal latency bounds, rather than raw throughput.
The performance concerns are mainly when scaling RISC V to many cores which will create contention for counter handling. The scalable counter system (ZVCs) was developed to address these issues and later the this cpu operations where optimizing that performance. At this point on RISC V with just a couple of cores you may not see too much of an effect. In fact a UP system would be running faster if it does not use these schemes since there is no contention. Scalability of counter operations becomes a challenge as core counts increase.
OK. I have been assuming that the risk of a scheduler call in preempt_enable() is what Christoph is concerned about here: https://lore.kernel.org/linux-riscv/b0653f7a6f1bc0c9329d37de690d3bed@mailhost.ics.forth.gr/T/#m6e609e26a9e5405c4a7e2dbd5ca8c969cada5c36 (local) If is possible to eliminate the latency risk from a 'simple' counter increment/decrement by creating a restricted API, that may be worthwhile. Christoph has also been concerned that the AMO operations will carry an unacceptable performance overhead. But the RISC-V AMO operations can be written such that they don't have the ordering restrictions that the Intel LOCK-prefixed operations do, and thus those concerns may not apply -- at least not to the same extent. Perhaps this is also true for the ARM LSE atomics.
My main concern at this point is to ensure that RISC V has the proper setup for the future and that decision are made that scaling up of RISC V to hundreds of cores (if not more) does not become a bottleneck. One of the use cases here for us is likely to have extreme parallel operations for HPC style compute. The main issue for the core VM is to limit the overhead of the statics and counter operations. Introducing atomic operations in key fault paths has caused performance regressions in the past. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel