Re: [PATCH V3] powerpc: Implement {cmp}xchg for u8 and u16

[PATCH V2] powerpc: Implement {cmp}xchg for u8 and u16 · Pan Xinhui <hidden> · 2016-04-19
Re: [PATCH V2] powerpc: Implement {cmp}xchg for u8 and u16 · Boqun Feng <hidden> · 2016-04-19
Re: [PATCH V2] powerpc: Implement {cmp}xchg for u8 and u16 · Pan Xinhui <hidden> · 2016-04-20
[PATCH V3] powerpc: Implement {cmp}xchg for u8 and u16 · Pan Xinhui <hidden> · 2016-04-20
Re: [PATCH V3] powerpc: Implement {cmp}xchg for u8 and u16 · Peter Zijlstra <peterz@infradead.org> · 2016-04-20
Re: [PATCH V3] powerpc: Implement {cmp}xchg for u8 and u16 · Pan Xinhui <hidden> · 2016-04-21
Re: [PATCH V3] powerpc: Implement {cmp}xchg for u8 and u16 · Boqun Feng <hidden> · 2016-04-21
Re: [PATCH V3] powerpc: Implement {cmp}xchg for u8 and u16 · Pan Xinhui <hidden> · 2016-04-22
Re: [PATCH V3] powerpc: Implement {cmp}xchg for u8 and u16 · Boqun Feng <hidden> · 2016-04-22
Re: [PATCH V3] powerpc: Implement {cmp}xchg for u8 and u16 · Peter Zijlstra <peterz@infradead.org> · 2016-04-21
Re: [PATCH V3] powerpc: Implement {cmp}xchg for u8 and u16 · Pan Xinhui <hidden> · 2016-04-25
Re: [PATCH V3] powerpc: Implement {cmp}xchg for u8 and u16 · Peter Zijlstra <peterz@infradead.org> · 2016-04-25
Re: [PATCH V3] powerpc: Implement {cmp}xchg for u8 and u16 · Pan Xinhui <hidden> · 2016-04-26
[PATCH V4] powerpc: Implement {cmp}xchg for u8 and u16 · Pan Xinhui <hidden> · 2016-04-27
Re: [PATCH V4] powerpc: Implement {cmp}xchg for u8 and u16 · Boqun Feng <hidden> · 2016-04-27
Re: [PATCH V4] powerpc: Implement {cmp}xchg for u8 and u16 · Boqun Feng <hidden> · 2016-04-27
Re: [PATCH V4] powerpc: Implement {cmp}xchg for u8 and u16 · Boqun Feng <hidden> · 2016-04-27
Re: [PATCH V4] powerpc: Implement {cmp}xchg for u8 and u16 · Boqun Feng <hidden> · 2016-04-27
Re: [PATCH V4] powerpc: Implement {cmp}xchg for u8 and u16 · Pan Xinhui <hidden> · 2016-04-28
Re: [PATCH V4] powerpc: Implement {cmp}xchg for u8 and u16 · Peter Zijlstra <peterz@infradead.org> · 2016-04-28
Re: [PATCH V4] powerpc: Implement {cmp}xchg for u8 and u16 · Pan Xinhui <hidden> · 2016-04-28
Re: [V4] powerpc: Implement {cmp}xchg for u8 and u16 · Michael Ellerman <hidden> · 2016-11-25

From: Boqun Feng <hidden>
Date: 2016-04-21 15:49:48
Also in: lkml

On Thu, Apr 21, 2016 at 11:35:07PM +0800, Pan Xinhui wrote:

On 2016年04月20日 22:24, Peter Zijlstra wrote:

quoted

On Wed, Apr 20, 2016 at 09:24:00PM +0800, Pan Xinhui wrote:

quoted

+#define __XCHG_GEN(cmp, type, sfx, skip, v)				\
+static __always_inline unsigned long					\
+__cmpxchg_u32##sfx(v unsigned int *p, unsigned long old,		\
+			 unsigned long new);				\
+static __always_inline u32						\
+__##cmp##xchg_##type##sfx(v void *ptr, u32 old, u32 new)		\
+{									\
+	int size = sizeof (type);					\
+	int off = (unsigned long)ptr % sizeof(u32);			\
+	volatile u32 *p = ptr - off;					\
+	int bitoff = BITOFF_CAL(size, off);				\
+	u32 bitmask = ((0x1 << size * BITS_PER_BYTE) - 1) << bitoff;	\
+	u32 oldv, newv, tmp;						\
+	u32 ret;							\
+	oldv = READ_ONCE(*p);						\
+	do {								\
+		ret = (oldv & bitmask) >> bitoff;			\
+		if (skip && ret != old)					\
+			break;						\
+		newv = (oldv & ~bitmask) | (new << bitoff);		\
+		tmp = oldv;						\
+		oldv = __cmpxchg_u32##sfx((v u32*)p, oldv, newv);	\
+	} while (tmp != oldv);						\
+	return ret;							\
+}

So for an LL/SC based arch using cmpxchg() like that is sub-optimal.

Why did you choose to write it entirely in C?

yes, you are right. more load/store will be done in C code.
However such xchg_u8/u16 is just used by qspinlock now. and I did not see any performance regression.
So just wrote in C, for simple. :)

Of course I have done xchg tests.
we run code just like xchg((u8*)&v, j++); in several threads.
and the result is,
[  768.374264] use time[1550072]ns in xchg_u8_asm

How was xchg_u8_asm() implemented, using lbarx or using a 32bit ll/sc
loop with shifting and masking in it?

Regards,
Boqun

[  768.377102] use time[2826802]ns in xchg_u8_c

I think this is because there is one more load in C.
If possible, we can move such code in asm-generic/.

thanks
xinhui

Attachments

signature.asc [application/pgp-signature] 473 bytes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help