Re: [PATCH v2 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock
From: Waiman Long <longman@redhat.com>
Date: 2019-06-12 15:05:28
Also in:
linux-arch, lkml
On 6/12/19 12:38 AM, Alex Kogan wrote:
Hi, Wei.quoted
On Jun 11, 2019, at 12:22 AM, liwei (GF) [off-list ref] wrote: Hi Alex, On 2019/3/29 23:20, Alex Kogan wrote:quoted
In CNA, spinning threads are organized in two queues, a main queue for threads running on the same node as the current lock holder, and a secondary queue for threads running on other nodes. At the unlock time, the lock holder scans the main queue looking for a thread running on the same node. If found (call it thread T), all threads in the main queue between the current lock holder and T are moved to the end of the secondary queue, and the lock is passed to T. If such T is not found, the lock is passed to the first node in the secondary queue. Finally, if the secondary queue is empty, the lock is passed to the next thread in the main queue. For more details, see https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_abs_1810.05600&d=DwICbg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=U7mfTbYj1r2Te2BBUUNbVrRPuTa_ujlpR4GZfUsrGTM&s=Dw4O1EniF-nde4fp6RA9ISlSMOjWuqeR9OS1G0iauj0&e=. Note that this variant of CNA may introduce starvation by continuously passing the lock to threads running on the same node. This issue will be addressed later in the series. Enabling CNA is controlled via a new configuration option (NUMA_AWARE_SPINLOCKS), which is enabled by default if NUMA is enabled. Signed-off-by: Alex Kogan <redacted> Reviewed-by: Steve Sistare <redacted> --- arch/x86/Kconfig | 14 +++ include/asm-generic/qspinlock_types.h | 13 +++ kernel/locking/mcs_spinlock.h | 10 ++ kernel/locking/qspinlock.c | 29 +++++- kernel/locking/qspinlock_cna.h | 173 ++++++++++++++++++++++++++++++++++ 5 files changed, 236 insertions(+), 3 deletions(-) create mode 100644 kernel/locking/qspinlock_cna.h(SNIP)quoted
+ +static __always_inline int get_node_index(struct mcs_spinlock *node) +{ + return decode_count(node->node_and_count++);When nesting level is > 4, it won't return a index >= 4 here and the numa node number is changed by mistake. It will go into a wrong way instead of the following branch. /* * 4 nodes are allocated based on the assumption that there will * not be nested NMIs taking spinlocks. That may not be true in * some architectures even though the chance of needing more than * 4 nodes will still be extremely unlikely. When that happens, * we fall back to spinning on the lock directly without using * any MCS node. This is not the most elegant solution, but is * simple enough. */ if (unlikely(idx >= MAX_NODES)) { while (!queued_spin_trylock(lock)) cpu_relax(); goto release; }Good point. This patch does not handle count overflows gracefully. It can be easily fixed by allocating more bits for the count — we don’t really need 30 bits for #NUMA nodes.
Actually, the default setting uses 2 bits for 4-level nesting and 14 bits for cpu numbers. That means it can support up to 16k-1 cpus. It is a limit that is likely to be exceeded in the foreseeable future. qspinlock also supports an additional mode with 21 bits used for cpu numbers. That can support up to 2M-1 cpus. However, this mode will be a little bit slower. That is why we don't want to use more than 2 bits for nesting as I have never see more than 2 level of nesting used in my testing. So it is highly unlikely we will ever hit more than 4 levels. I am not saying that it is impossible, though. Cheers, Longman _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel