Re: [PATCH RFC 1/2] arch: Introduce ARCH_HAS_HW_XCHG_SMALL
From: Waiman Long <hidden>
Date: 2021-07-26 21:21:01
On 7/26/21 1:03 PM, Boqun Feng wrote:
On Tue, Jul 27, 2021 at 12:41:34AM +0800, Guo Ren wrote:quoted
On Mon, Jul 26, 2021 at 6:39 PM Boqun Feng [off-list ref] wrote:quoted
On Mon, Jul 26, 2021 at 04:56:49PM +0800, Huacai Chen wrote:quoted
Hi, Geert, On Mon, Jul 26, 2021 at 4:36 PM Geert Uytterhoeven [off-list ref] wrote:quoted
Hi Huacai, On Sat, Jul 24, 2021 at 2:36 PM Huacai Chen [off-list ref] wrote:quoted
Introduce a new Kconfig option ARCH_HAS_HW_XCHG_SMALL, which means arch has hardware sub-word xchg/cmpxchg support. This option will be used as an indicator to select the bit-field definition in the qspinlock data structure. Signed-off-by: Huacai Chen <redacted>Thanks for your patch!quoted
--- a/arch/Kconfig +++ b/arch/Kconfig@@ -228,6 +228,10 @@ config ARCH_HAS_FORTIFY_SOURCE An architecture should select this when it can successfully build and run with CONFIG_FORTIFY_SOURCE. +# Select if arch has hardware sub-word xchg/cmpxchg support +config ARCH_HAS_HW_XCHG_SMALLWhat do you mean by "hardware"? Does a software fallback count?This new option is supposed as an indicator to select bit-field definition of qspinlock, software fallback is not helpful in this case.I don't think this is true. IIUC, the rationale of the config is that for some architectures, since the architectural cmpxchg doesn't provide forward-progress guarantee then using cmpxchg of machine-word to implement xchg{8,16}() may cause livelock, therefore these architectures don't want to provide xchg{8,16}(), as a result they cannot work with qspinlock when _Q_PENDING_BITS is 8. So as long as an architecture can provide and has already provided an implementation of xchg{8,16}() which guarantee forward-progress (even though the implementation is using a machine-word size cmpxchg), the architecture doesn't need to select ARCH_HAS_HW_XCHG_SMALL.Seems only atomic could provide forward progress, isn't it? And lr/sc couldn't implement xchg/cmpxchg primitive properly.I'm missing you point here, a) ll/sc can provide forward progress and b) ll/sc instructions are used to implement xchg/cmpxchg (see ARM64 and PPC).quoted
How to make CPU guarantee "load + cmpxchg" forward-progress? Fusion these instructions and lock the snoop channel? Maybe hardware guys would think that it's easier to implement cas + dcas + amo(short & byte).Please note that if _Q_PENDING_BITS == 1, then the xchg_tail() is implemented as a "load + cmpxchg", so if "load + cmpxchg" implementation of xchg16() doesn't provide forward-progress in an architecture, neither does xchg_tail().
Agreed. The xchg_tail() for the "_Q_PENDING_BITS == 1" case is a software emulation of xchg16(). Pure software emulation like that does not provide forward progress guarantee. This is usually not a big problem for non-RT kernel for which occasional long latency is acceptable, but it is not good for RT kernel. Cheers, Longman