Thread (55 messages) 55 messages, 6 authors, 2021-10-12

Re: [PATCH v6 03/24] rtw89: add core and trx files

From: Arnd Bergmann <arnd@arndb.de>
Date: 2021-10-05 08:42:38

On Tue, Oct 5, 2021 at 9:46 AM Kalle Valo [off-list ref] wrote:
Pkshih [off-list ref] writes:
quoted
quoted
From: kvalo=codeaurora.org@mg.codeaurora.org
quoted
+static __always_inline void RTW89_SET_TXWD(u8 *txdesc, u32 val,
u8 offset, u32 mask)
+{
+  u32 *txd32 = (u32 *)txdesc;
+
+  le32p_replace_bits((__le32 *)(txd32 + offset), val, mask);
+}
I'm not convinced about this either, please just use inline.
This is because 'mask' argument of le32p_replace_bits() must be constant
only. If I use inline and build this driver with ccflags-y += -Os,
compiler reports errors:

In function 'field_multiplier',
    inlined from 'le32_encode_bits' at ./include/linux/bitfield.h:154:1,
    inlined from 'le32p_replace_bits' at ./include/linux/bitfield.h:154:1,
    inlined from 'RTW89_SET_FWCMD_UA32.constprop' at /work/git-root/rtwlan/rtw89/fw.h:1397:2:
./include/linux/bitfield.h:119:3: error: call to '__bad_mask' declared with attribute error: bad bitfield mask
  119 |   __bad_mask();
      |   ^~~~~~~~~~~~

I check the implement of le32p_replace_bits(), it looks like

static __always_inline void type##p_replace_bits(__##type *p,           \
                                        base val, base field)           \
{                                                                       \
        *p = (*p & ~to(field)) | type##_encode_bits(val, field);        \
}

So, I imitate the function to use __always_inline, and then it works.

Do you think I don't need to consider the case of Os?
But, -Os seems a standard option of Linux kernel.

ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE
KBUILD_CFLAGS += -O2
else ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3
KBUILD_CFLAGS += -O3
else ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
KBUILD_CFLAGS += -Os
endif
Yeah, we need to support -Os.

Arnd, what do you suggest? Is __always_inline good solution for this? I
think we should at least add a comment explaining why it's needed.
__always_inline can make sense to force the compiler to behave
sanely if it doesn't work it out by itself, and I think that is how this
function was meant to be used: the __compiletime_error in bitfield.h
is intended to find any callers that have a non-constant argument,
because that would result in horrible code.

I would suggest looking at the object code that you get with -Os after
the added __always_inline, just to make sure that this isn't also
horrible.

Looking at the driver code, as in

+#define RTW89_SET_TXWD_BODY_WP_OFFSET(txdesc, val) \
+ RTW89_SET_TXWD(txdesc, val, 0x00, GENMASK(31, 24))
+#define RTW89_SET_TXWD_BODY_MORE_DATA(txdesc, val) \
+ RTW89_SET_TXWD(txdesc, val, 0x00, BIT(23))
+#define RTW89_SET_TXWD_BODY_WD_INFO_EN(txdesc, val) \
+ RTW89_SET_TXWD(txdesc, val, 0x00, BIT(22))
+#define RTW89_SET_TXWD_BODY_FW_DL(txdesc, val) \
+ RTW89_SET_TXWD(txdesc, val, 0x00, BIT(20))

I would personally write this without the wrappers, instead defining the
bitmask macros as the masks and then open-coding the
le32p_replace_bits() calls instead, which I would find more
intuitive while it avoids the problem with the bitmasks.

Going back one more step, I see that that rtw89_core_fill_txdesc()
manipulates the descriptor fields in-memory, which also seems
like a bad idea: The descriptor is mapped as cache-coherent,
so on machines with no coherent DMA (i.e. most ARM or MIPS
machines), that is uncached memory, and writing the descriptor
using a series of read-modify-write cycles on uncached memory
will be awfully slow. Maybe the answer is to just completely
replace the descriptor access.

       Arnd
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help