Thread (24 messages) 24 messages, 4 authors, 2025-07-31

Re: [PATCH net-next v10 1/8] hinic3: Async Event Queue interfaces

From: Simon Horman <horms@kernel.org>
Date: 2025-07-31 14:04:11
Also in: linux-doc, lkml

On Thu, Jul 31, 2025 at 03:58:39PM +0300, Gur Stavi wrote:
quoted
On Thu, Jul 24, 2025 at 09:45:51PM +0800, Fan Gong wrote:
quoted
quoted
quoted
+
+/* Data provided to/by cmdq is arranged in structs with little endian fields but
+ * every dword (32bits) should be swapped since HW swaps it again when it
+ * copies it from/to host memory.
+ */
This scheme may work on little endian hosts.
But if so it seems unlikely to work on big endian hosts.

I expect you want be32_to_cpu_array() for data coming from hw,
with a source buffer as an array of __be32 while
the destination buffer is an array of u32.

And cpu_to_be32_array() for data going to the hw,
with the types of the source and destination buffers reversed.

If those types don't match your data, then we have
a framework to have that discussion.


That said, it is more usual for drivers to keep structures in the byte
order they are received. Stored in structures with members with types, in
this case it seems that would be __be32, and accessed using a combination
of BIT/GENMASK, FIELD_PREP/FIELD_GET, and cpu_to_be*/be*_to_cpu (in this
case cpu_to_be32/be32_to_cpu).

An advantage of this approach is that the byte order of
data is only changed when needed. Another is that it is clear
what the byte order of data is.
There is a simplified example:

Here is a 64 bit little endian that may appear in cmdq:
__le64 x
After the swap it will become:
__be32 x_lo
__be32 x_hi
This is NOT __be64.
__be64 looks like this:
__be32 x_hi
__be32 x_lo
Sure, byte swapping 64 bit entities is different to byte swapping two
consecutive 32 bit entities. I completely agree.
quoted
So the swapped data by HW is neither BE or LE. In this case, we should use
swab32 to obtain the correct LE data because our driver currently supports LE.
This is for compensating for bad HW decisions.
Let us assume that the host is reading data provided by HW.

If the swab32 approach works on a little endian host
to allow the host to access 32-bit values in host byte order.
Then this is because it outputs a 32-bit little endian values
Values can be any size. 32 bit is arbitrary.
.
quoted
But, given the same input, it will not work on a big endian host.
This is because the same little endian output will be produced,
while the host byte order is big endian.

I think you need something based on be32_to_cpu()/cpu_to_be32().
This will effectively be swab32 on little endian hosts (no change!).
And a no-op on big endian hosts (addressing my point above).

More specifically, I think you should use be32_to_cpu_array() and
cpu_to_be32_array() instead of swab32_array().
Lets define a "coherent struct" as a structure made of fields that makes sense
to human beings. Every field endianity is defined and fields are arranged in
order that "makes sense". Fields can be of any integer size 8,16,32,64 and not
necessarily naturally aligned.

swab32_array transforms a coherent struct into "byte jumble". Small fields are
reordered and larger (misaligned) fields may be split into 2 (or even 3) parts.
swab32_array is reversible so a 2nd call with byte jumble as input will produce
the original coherent struct.

hinic3 dma has "swab32_array" built in.
On send-to-device it expects a byte jubmle so the DMA engine will transform it
into a coherent struct.
On receive-from-device it provides a byte jumble so the driver needs
to call swab32_array to transform it into a coherent struct.

The hinic3_cmdq_buf_swab32 function will work correctly, producing byte jumble,
on little endian and big endian hosts.

The code that runs prior to hinic3_cmdq_buf_swab32 that initializes a coherent
struct is endianity sensitive. It needs to initialize fields based on their
coherent endianity with or without byte swap. Practically use cpu_to_le or
cpu_to_be based on the coherent definition.

Specifically, cmdq "coherent structs" in hinic3 use little endian and since
Kconfig currently declares that big endian hosts are not supported then
coherent structs are initialized without explicit cpu_to_le macros.

And this is what the comment says:

/* Data provided to/by cmdq is arranged in structs with little endian fields but
 * every dword (32bits) should be swapped since HW swaps it again when it
 * copies it from/to host memory.
 */
Thanks, I think I am closer to understanding things now.

Let me try and express things in my own words:

1. On the hardware side, things are stored in a way that may be represented
   as structures with little-endian values. The members of the structures may
   have different sizes: 8-bit, 16-bit, 32-bit, ...

2. The hardware runs the equivalent of swab32_array() over this data
   when writing it to (or reading it from) the host. So we get a
   "byte jumble".

3. In this patch, the hinic3_cmdq_buf_swab32 reverses this jumbling
   by running he equivalent of swab32_array() over this data again.

   As 3 exactly reverses 2, what is left are structures exactly as in 1.

If so, I agree this makes sense and I am sorry for missing this before.

And if so, is the intention for the cmdq "coherent structs" in the driver
to look something like this.

   struct {
	u8 a;
	u8 b;
	__le16 c;
	__le32 d;
   };

If so, this seems sensible to me.

But I think it would be best so include some code in this patchset
that makes use of such structures - sorry if it is there, I couldn't find
it just now.

And, although there is no intention for the driver to run on big endian
systems, the __le* fields should be accessed using cpu_to_le*/le*_to_cpu
helpers.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help