Thread (26 messages) 26 messages, 10 authors, 2021-07-29

Re: [PATCH RFC 1/2] arch: Introduce ARCH_HAS_HW_XCHG_SMALL

From: Waiman Long <hidden>
Date: 2021-07-26 21:21:01

On 7/26/21 1:03 PM, Boqun Feng wrote:
On Tue, Jul 27, 2021 at 12:41:34AM +0800, Guo Ren wrote:
quoted
On Mon, Jul 26, 2021 at 6:39 PM Boqun Feng [off-list ref] wrote:
quoted
On Mon, Jul 26, 2021 at 04:56:49PM +0800, Huacai Chen wrote:
quoted
Hi, Geert,

On Mon, Jul 26, 2021 at 4:36 PM Geert Uytterhoeven [off-list ref] wrote:
quoted
Hi Huacai,

On Sat, Jul 24, 2021 at 2:36 PM Huacai Chen [off-list ref] wrote:
quoted
Introduce a new Kconfig option ARCH_HAS_HW_XCHG_SMALL, which means arch
has hardware sub-word xchg/cmpxchg support. This option will be used as
an indicator to select the bit-field definition in the qspinlock data
structure.

Signed-off-by: Huacai Chen <redacted>
Thanks for your patch!
quoted
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -228,6 +228,10 @@ config ARCH_HAS_FORTIFY_SOURCE
           An architecture should select this when it can successfully
           build and run with CONFIG_FORTIFY_SOURCE.

+# Select if arch has hardware sub-word xchg/cmpxchg support
+config ARCH_HAS_HW_XCHG_SMALL
What do you mean by "hardware"?
Does a software fallback count?
This new option is supposed as an indicator to select bit-field
definition of qspinlock, software fallback is not helpful in this
case.
I don't think this is true. IIUC, the rationale of the config is that
for some architectures, since the architectural cmpxchg doesn't provide
forward-progress guarantee then using cmpxchg of machine-word to
implement xchg{8,16}() may cause livelock, therefore these architectures
don't want to provide xchg{8,16}(), as a result they cannot work with
qspinlock when _Q_PENDING_BITS is 8.

So as long as an architecture can provide and has already provided an
implementation of xchg{8,16}() which guarantee forward-progress (even
though the implementation is using a machine-word size cmpxchg), the
architecture doesn't need to select ARCH_HAS_HW_XCHG_SMALL.
Seems only atomic could provide forward progress, isn't it? And lr/sc
couldn't implement xchg/cmpxchg primitive properly.
I'm missing you point here, a) ll/sc can provide forward progress and b)
ll/sc instructions are used to implement xchg/cmpxchg (see ARM64 and
PPC).
quoted
How to make CPU guarantee  "load + cmpxchg" forward-progress? Fusion
these instructions and lock the snoop channel?
Maybe hardware guys would think that it's easier to implement cas +
dcas + amo(short & byte).
Please note that if _Q_PENDING_BITS == 1, then the xchg_tail() is
implemented as a "load + cmpxchg", so if "load + cmpxchg" implementation
of xchg16() doesn't provide forward-progress in an architecture, neither
does xchg_tail().
Agreed. The xchg_tail() for the "_Q_PENDING_BITS == 1" case is a 
software emulation of xchg16(). Pure software emulation like that does 
not provide forward progress guarantee. This is usually not a big 
problem for non-RT kernel for which occasional long latency is 
acceptable, but it is not good for RT kernel.

Cheers,
Longman
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help