Re: [PATCH v8 4/5] locking/qspinlock: Introduce starvation avoidance into CNA
From: Alex Kogan <hidden>
Date: 2020-02-04 17:54:48
Also in:
linux-arch, lkml
On Feb 4, 2020, at 12:39 PM, Waiman Long [off-list ref] wrote: On 2/4/20 12:27 PM, Peter Zijlstra wrote:quoted
On Tue, Feb 04, 2020 at 11:54:02AM -0500, Alex Kogan wrote:quoted
quoted
On Feb 3, 2020, at 10:47 AM, Waiman Long [off-list ref] wrote: On 2/3/20 10:28 AM, Peter Zijlstra wrote:quoted
On Mon, Feb 03, 2020 at 09:59:12AM -0500, Waiman Long wrote:quoted
On 2/3/20 8:45 AM, Peter Zijlstra wrote:quoted
Presumably you have a workload where CNA is actually a win? That is, what inspired you to go down this road? Which actual kernel lock is so contended on NUMA machines that we need to do this?There are quite a few actually. files_struct.file_lock, file_lock_context.flc_lock and lockref.lock are some concrete examples that get very hot in will-it-scale benchmarks.Right, that's all a variant of banging on the same resources across nodes. I'm not sure there's anything fundamental we can fix there.
Not much, except gain that 2x from a better lock.
quoted
quoted
And then there are spinlocks in __futex_data.queues, which get hot when applications have contended (pthread) locks — LevelDB is an example.A numa aware rework of futexes has been on the todo list for years :/Now, we are going to get that for free with this patchset:-)
Exactly!!
quoted
quoted
Our initial motivation was based on an observation that kernel qspinlock is not NUMA-aware. So what, you may ask. Much like people realized in the past that global spinning is bad for performance, and they switched from ticket lock to locks with local spinning (e.g., MCS), I think everyone would agree these days that bouncing a lock (and cache lines in general) across numa nodes is similarly bad. And as CNA demonstrates, we are easily leaving 2-3x speedups on the table by doing just that with the current qspinlock.Actual benchmarks with performance numbers are required. It helps motivate the patches as well as gives reviewers clues on how to reproduce / inspect the claims made.I think the cover-letter does have some benchmark results listed.
To clarify on that, I _used to include benchmark results in the cover letter for previous revisions. I stopped doing that as the changes between revisions were rather minor — maybe that is the missing part? If so, my apologies, I can certainly publish them again. The point is that we have numbers for actual benchmarks, plus the kernel build robot has sent quite a few reports on positive improvements in the performance of AIM7 and other benchmarks due to CNA (plus ARM folks noticed improvement in their benchmarks, although I think those were mostly microbenchmarks. Yet, it is evident that the improvements are cross-platform.) Regards, — Alex _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel