Thread (55 messages) 55 messages, 6 authors, 2021-10-12

RE: [PATCH v6 03/24] rtw89: add core and trx files

From: Pkshih <pkshih@realtek.com>
Date: 2021-10-05 09:32:59

-----Original Message-----
From: Arnd Bergmann <arnd@arndb.de>
Sent: Tuesday, October 5, 2021 4:42 PM
To: Kalle Valo <redacted>
Cc: Pkshih <pkshih@realtek.com>; Arnd Bergmann <arnd@arndb.de>; linux-wireless@vger.kernel.org
Subject: Re: [PATCH v6 03/24] rtw89: add core and trx files

On Tue, Oct 5, 2021 at 9:46 AM Kalle Valo [off-list ref] wrote:
quoted
Pkshih [off-list ref] writes:
quoted
quoted
From: kvalo=codeaurora.org@mg.codeaurora.org
quoted
+static __always_inline void RTW89_SET_TXWD(u8 *txdesc, u32 val,
u8 offset, u32 mask)
+{
+  u32 *txd32 = (u32 *)txdesc;
+
+  le32p_replace_bits((__le32 *)(txd32 + offset), val, mask);
+}
I'm not convinced about this either, please just use inline.
This is because 'mask' argument of le32p_replace_bits() must be constant
only. If I use inline and build this driver with ccflags-y += -Os,
compiler reports errors:

In function 'field_multiplier',
    inlined from 'le32_encode_bits' at ./include/linux/bitfield.h:154:1,
    inlined from 'le32p_replace_bits' at ./include/linux/bitfield.h:154:1,
    inlined from 'RTW89_SET_FWCMD_UA32.constprop' at
/work/git-root/rtwlan/rtw89/fw.h:1397:2:
quoted
quoted
./include/linux/bitfield.h:119:3: error: call to '__bad_mask' declared with attribute error:
bad bitfield mask
quoted
quoted
  119 |   __bad_mask();
      |   ^~~~~~~~~~~~

I check the implement of le32p_replace_bits(), it looks like

static __always_inline void type##p_replace_bits(__##type *p,           \
                                        base val, base field)           \
{                                                                       \
        *p = (*p & ~to(field)) | type##_encode_bits(val, field);        \
}

So, I imitate the function to use __always_inline, and then it works.

Do you think I don't need to consider the case of Os?
But, -Os seems a standard option of Linux kernel.

ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE
KBUILD_CFLAGS += -O2
else ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3
KBUILD_CFLAGS += -O3
else ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
KBUILD_CFLAGS += -Os
endif
Yeah, we need to support -Os.

Arnd, what do you suggest? Is __always_inline good solution for this? I
think we should at least add a comment explaining why it's needed.
__always_inline can make sense to force the compiler to behave
sanely if it doesn't work it out by itself, and I think that is how this
function was meant to be used: the __compiletime_error in bitfield.h
is intended to find any callers that have a non-constant argument,
because that would result in horrible code.

I would suggest looking at the object code that you get with -Os after
the added __always_inline, just to make sure that this isn't also
horrible.
I check the function rtw89_core_fill_txdesc() which uses these macros.
With inline, the object code size is 0x1AF. With __always_inline and
-Os, the size is 0x1A4. (x86-64 platform)

Compare the object codes side-by-side, they are almost the same except
to some instructions. I think this is because the inline function
I apply __always_inline contains only a simple statement.
Looking at the driver code, as in

+#define RTW89_SET_TXWD_BODY_WP_OFFSET(txdesc, val) \
+ RTW89_SET_TXWD(txdesc, val, 0x00, GENMASK(31, 24))
+#define RTW89_SET_TXWD_BODY_MORE_DATA(txdesc, val) \
+ RTW89_SET_TXWD(txdesc, val, 0x00, BIT(23))
+#define RTW89_SET_TXWD_BODY_WD_INFO_EN(txdesc, val) \
+ RTW89_SET_TXWD(txdesc, val, 0x00, BIT(22))
+#define RTW89_SET_TXWD_BODY_FW_DL(txdesc, val) \
+ RTW89_SET_TXWD(txdesc, val, 0x00, BIT(20))

I would personally write this without the wrappers, instead defining the
bitmask macros as the masks and then open-coding the
le32p_replace_bits() calls instead, which I would find more
intuitive while it avoids the problem with the bitmasks.
Use these macros can address offset and bit fields quickly.
How about I use macro instead of inline function? Like,

#define RTW89_SET_TXWD (txdesc, val, offset, mask) \
do { \
	u32 *txd32 = (u32 *)txdesc; \
	le32p_replace_bits((__le32 *)(txd32 + offset), val, mask); \
} while (0)

Going back one more step, I see that that rtw89_core_fill_txdesc()
manipulates the descriptor fields in-memory, which also seems
like a bad idea: The descriptor is mapped as cache-coherent,
so on machines with no coherent DMA (i.e. most ARM or MIPS
machines), that is uncached memory, and writing the descriptor
using a series of read-modify-write cycles on uncached memory
will be awfully slow. Maybe the answer is to just completely
replace the descriptor access.
I'll think if we can use chached memory with single_map/unmap for
descriptor. That would improve the performance.

--
Ping-Ke
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help