Re: [PATCH RFC 1/2] arch: Introduce ARCH_HAS_HW_XCHG_SMALL
From: Boqun Feng <hidden>
Date: 2021-07-26 17:25:08
On Tue, Jul 27, 2021 at 12:41:34AM +0800, Guo Ren wrote:
On Mon, Jul 26, 2021 at 6:39 PM Boqun Feng [off-list ref] wrote:quoted
On Mon, Jul 26, 2021 at 04:56:49PM +0800, Huacai Chen wrote:quoted
Hi, Geert, On Mon, Jul 26, 2021 at 4:36 PM Geert Uytterhoeven [off-list ref] wrote:quoted
Hi Huacai, On Sat, Jul 24, 2021 at 2:36 PM Huacai Chen [off-list ref] wrote:quoted
Introduce a new Kconfig option ARCH_HAS_HW_XCHG_SMALL, which means arch has hardware sub-word xchg/cmpxchg support. This option will be used as an indicator to select the bit-field definition in the qspinlock data structure. Signed-off-by: Huacai Chen <redacted>Thanks for your patch!quoted
--- a/arch/Kconfig +++ b/arch/Kconfig@@ -228,6 +228,10 @@ config ARCH_HAS_FORTIFY_SOURCE An architecture should select this when it can successfully build and run with CONFIG_FORTIFY_SOURCE. +# Select if arch has hardware sub-word xchg/cmpxchg support +config ARCH_HAS_HW_XCHG_SMALLWhat do you mean by "hardware"? Does a software fallback count?This new option is supposed as an indicator to select bit-field definition of qspinlock, software fallback is not helpful in this case.I don't think this is true. IIUC, the rationale of the config is that for some architectures, since the architectural cmpxchg doesn't provide forward-progress guarantee then using cmpxchg of machine-word to implement xchg{8,16}() may cause livelock, therefore these architectures don't want to provide xchg{8,16}(), as a result they cannot work with qspinlock when _Q_PENDING_BITS is 8. So as long as an architecture can provide and has already provided an implementation of xchg{8,16}() which guarantee forward-progress (even though the implementation is using a machine-word size cmpxchg), the architecture doesn't need to select ARCH_HAS_HW_XCHG_SMALL.Seems only atomic could provide forward progress, isn't it? And lr/sc couldn't implement xchg/cmpxchg primitive properly.
I'm missing you point here, a) ll/sc can provide forward progress and b) ll/sc instructions are used to implement xchg/cmpxchg (see ARM64 and PPC).
How to make CPU guarantee "load + cmpxchg" forward-progress? Fusion these instructions and lock the snoop channel? Maybe hardware guys would think that it's easier to implement cas + dcas + amo(short & byte).
Please note that if _Q_PENDING_BITS == 1, then the xchg_tail() is implemented as a "load + cmpxchg", so if "load + cmpxchg" implementation of xchg16() doesn't provide forward-progress in an architecture, neither does xchg_tail(). Regards, Boqun
quoted
Regards, Boqunquoted
quoted
quoted
--- a/arch/m68k/Kconfig +++ b/arch/m68k/Kconfig@@ -5,6 +5,7 @@ config M68K select ARCH_32BIT_OFF_T select ARCH_HAS_BINFMT_FLAT select ARCH_HAS_DMA_PREP_COHERENT if HAS_DMA && MMU && !COLDFIRE + select ARCH_HAS_HW_XCHG_SMALLM68k CPUs which support the CAS (Compare And Set) instruction do support this on 8-bit, 16-bit, and 32-bit quantities. M68k CPUs which lack CAS use a software implementation, which supports the same quantities. As CAS is used only if CONFIG_RMW_INSNS=y, perhaps this needs a dependency?OK, I think this dependency is needed. Huacaiquoted
select ARCH_HAS_HW_XCHG_SMALL if RMW_INSNS Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds-- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/