Thread (26 messages) 26 messages, 10 authors, 2021-07-29

Re: [PATCH RFC 1/2] arch: Introduce ARCH_HAS_HW_XCHG_SMALL

From: Peter Zijlstra <peterz@infradead.org>
Date: 2021-07-28 11:03:22

On Wed, Jul 28, 2021 at 06:40:54PM +0800, Huacai Chen wrote:
Hi, Peter,

On Tue, Jul 27, 2021 at 7:06 PM Peter Zijlstra [off-list ref] wrote:
quoted
On Tue, Jul 27, 2021 at 10:29:59AM +0800, Boqun Feng wrote:
quoted
quoted
"How to implement xchg_tail" shouldn't force with _Q_PENDING_BITS, but
the arch could choose.
I actually agree with this part, but this patchset failed to provide
enough evidences on why we should choose xchg_tail() implementation
based on whether hardware has xchg16, more precisely, for an archtecture
which doesn't have a hardware xchg16, why cmpxchg emulated xchg16() is
worse than a "load+cmpxchg) implemeneted xchg_tail()? If it's a
performance reason, please show some numbers.
Right. Their problem is their broken xchg16() implementation.
Please correct me if I'm wrong. Now my understanding is like this:
1, _Q_PENDING_BITS=1 qspinlock can be used by all archs, though it may
be not optimized.
Only if your arch has fwd progress guarantees for cmpxchg(). LL/SC based
cmpxchg() is tricky, and typically needs software based backoff on
failure.

The qspinlock code is written in generic code, but it very much relies
on an architecture to audit and vet the resulting code is sane for them.
Clearly MIPS didn't do a good job of that.
2, _Q_PENDING_BITS=8 qspinlock can be used if hardware supports
sub-word xchg/cmpxchg, or the software emulation is correctly
implemented. But the current MIPS emulation is not correct.
Everything always relies on things being correctly implemented. 1)
relies on cmpxchg() being correctly implemented. 2) relies on xchg16()
being correctly implemented.

Of these 2 is actually easier to implement correctly on LL/SC.
If so, I want to rename ARCH_HAS_HW_XCHG_SMALL to
ARCH_HAS_FAST_XCHG_SMALL, and let these archs select it:
1, X86, ARM, ARM64, IA64, M68K, because they have hardware support.
2, Other archs who select qspinlock currently (including MIPS),
because their current behavior is use _Q_PENDING_BITS=8 qspinlock and
we don't want to change anything in this patch. If their emulation is
broken or not as "fast" as expected, we can make new patches to
unselect the ARCH_HAS_FAST_XCHG_SMALL option.
I utterly fail to see the point of any of this. If you use qspinlock,
you had better have audited the whole thing for your architecture. And
FAST_XCHG_SMALL is completely irrelevant for anything here.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help