Thread (19 messages) 19 messages, 4 authors, 2018-06-22

[PATCH V2 2/4] dt-bindings: arm: fsl: add mu binding doc

From: aisheng.dong@nxp.com (A.s. Dong)
Date: 2018-06-22 08:16:19
Also in: linux-devicetree

-----Original Message-----
From: Oleksij Rempel [mailto:o.rempel at pengutronix.de]
Sent: Friday, June 22, 2018 2:49 PM
To: A.s. Dong <aisheng.dong@nxp.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>; Rob Herring
[off-list ref]; Mark Rutland [off-list ref];
devicetree at vger.kernel.org; dongas86 at gmail.com; dl-linux-imx <linux-
imx at nxp.com>; kernel at pengutronix.de; Fabio Estevam
[off-list ref]; shawnguo at kernel.org; linux-arm-
kernel at lists.infradead.org
Subject: Re: [PATCH V2 2/4] dt-bindings: arm: fsl: add mu binding doc

On Fri, Jun 22, 2018 at 05:59:30AM +0000, A.s. Dong wrote:
quoted
Hi Oleksij,
quoted
-----Original Message-----
From: Oleksij Rempel [mailto:o.rempel at pengutronix.de]
Sent: Friday, June 22, 2018 12:59 PM
To: A.s. Dong <aisheng.dong@nxp.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>; Rob Herring
[off-list ref]; Mark Rutland [off-list ref];
devicetree at vger.kernel.org; dongas86 at gmail.com; dl-linux-imx <linux-
imx at nxp.com>; kernel at pengutronix.de; Fabio Estevam
[off-list ref]; shawnguo at kernel.org; linux-arm-
kernel at lists.infradead.org
Subject: Re: [PATCH V2 2/4] dt-bindings: arm: fsl: add mu binding
doc

On Fri, Jun 22, 2018 at 03:31:11AM +0000, A.s. Dong wrote:
quoted
quoted
-----Original Message-----
From: Oleksij Rempel [mailto:o.rempel at pengutronix.de]
Sent: Friday, June 22, 2018 2:09 AM
To: A.s. Dong <aisheng.dong@nxp.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>; Rob Herring
[off-list ref]; Mark Rutland [off-list ref];
devicetree at vger.kernel.org; dongas86 at gmail.com; dl-linux-imx
<linux- imx@nxp.com>; kernel at pengutronix.de; Fabio Estevam
[off-list ref]; shawnguo at kernel.org; linux-arm-
kernel at lists.infradead.org
Subject: Re: [PATCH V2 2/4] dt-bindings: arm: fsl: add mu
binding doc

Hi,

On Thu, Jun 21, 2018 at 05:11:33PM +0000, A.s. Dong wrote:
quoted
Hi Sascha,
quoted
On Wed, Jun 20, 2018 at 01:43:10PM -0600, Rob Herring wrote:
quoted
On Sun, Jun 17, 2018 at 08:49:47PM +0800, Dong Aisheng wrote:
quoted
The Messaging Unit module enables two processors within
the SoC to communicate and coordinate by passing messages
(e.g.
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
data, status and control) through the MU interface.

---
v1->v2:
 * typo fixes
 * remove status property
 * remove imx6&7 compatible string which may be added later
for
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
   the generic mailbox binding

Note: Because MU used by SCU is not implemented as a
mailbox driver, Instead, they're provided in library
calls to gain higher
performance.
quoted
quoted
quoted
Using a binding doesn't mean you have to use an OS's
subsystem.
quoted
quoted
quoted
quoted
quoted
quoted
quoted
What needs higher performance? What's the performance
difference?
quoted
quoted
quoted
quoted
Why
quoted
can't the mailbox framework be improved?
From what I see the performance is improved by polling the
interrupt registers rather than using interrupts.
I see no reason though why this can't be implemented with
the mailbox framework as is.
I thought you've agreed to not write generic MU(mailbox)
driver for
SCU.
quoted
quoted
quoted
https://www.spinics.net/lists/arm-kernel/msg650217.html
But seems it's still not quite certain...

I'd like to explain some more.

1) the interrupt mechanism is not quite suitable for SCU
firmware protocol as the transfer size would be more than 4
words and the response data size is also undetermined (it's
set by SCU firmware side
during a response).
quoted
So polling mode may be the best way to handle this as MU
message handling usually is quite fast in a few microseconds.

2) It's true that Mailbox framework is well designed and powerful.
But it's not quite suitable for SCU MU as we don't need to use
the most bits of it. Mailbox seems like to be more suitable
for a generic MU mailbox driver used by various
clients/servers.  But SCU MU are quite specific to SCU
protocol and can't be used by other clients (MU
0~4 is fixed for SCU communication in MX8 HW design).
Even we write a MU Mailbox driver for SCU MU, it's still not a
general one and can't be used by others (e.g. communication with
M4).
quoted
quoted
quoted
quoted
quoted
So I'd believe the current library way is still the best
approach for SCU MU Using. But I'm also okay for another
generic MU drivers for other common communications between A
core and M4 side.
quoted
quoted
quoted
quoted
quoted
3) We actually have tried the MU(Mailbox) internally, it
increased a lot complexity comparing to the current library
way and got a booting time regression issue due to extra
delays introduced in handling SCU protocol in mailbox way.

And finally a nature question to us is:
What the benefit we can get for SCU MU using a mailbox way?

If we can't find benefits but introduce more complexities then
why we would do that way?
Looks like my response to same topic within my patch set is
lost, so I repost it
here:

ok.. let's take some of IMX8 SCU driver code to see if there any
difference:
quoted
quoted
..this part of the code is blocking write procedure for one
channeler (register or what ever name you prefer) per write.. correct?

+void mu_send_msg(struct mu_priv *priv, uint32_t index, uint32_t
+msg)
{
quoted
quoted
+	uint32_t mask = MU_SR_TE0_MASK >> index;
+
+	/* Wait TX register to be empty. */
+	while (!(readl_relaxed(priv->base + MU_ASR) & mask))
+		;
+	writel_relaxed(msg, priv->base + MU_ATR0  + (index * 4)); }
+EXPORT_SYMBOL_GPL(mu_send_msg);

According to documentation it is recommended to use only one
status bit for the last register to use MU as one big 4words sized pipe.
But, there is no way you can write to all 4 registers without
checking status for each of this register, because your protocol
has dynamic message size. So you are forced to use your one
channel as 4
separate channels.
quoted
quoted
Write it part of the message separately and allow your firmware
read
1 word to understand how to behave on the rest of the message.

+static void sc_ipc_write(struct sc_ipc *sc_ipc, void *data) {
+	sc_rpc_msg_t *msg = (sc_rpc_msg_t *) data;
+	uint8_t count = 0;
+
+	/* Check size */
+	if (msg->size > SC_RPC_MAX_MSG)
+		return;
+
+	/* Write first word */
+	mu_send_msg(sc_ipc->mu_base, 0, *((uint32_t *) msg));
+	count++;

.. in this loop you are writing to one channel/register per
loop. If the communicate will stall for some reason, the linux
system will just freeze here without any timeout or error
message.. no idea how
about the opposite site.
quoted
quoted
+	/* Write remaining words */
+	while (count < msg->size) {
+		mu_send_msg(sc_ipc->mu_base, count %
MU_TR_COUNT,
quoted
quoted
quoted
quoted
+			    msg->DATA.u32[count - 1]);
+		count++;
+	}
+}


... and here is a proof that sc_ipc_write will do in some cases
5 rounds
(5 * 4 bytes = 20 bytes single message) with probable busy
waiting for each channel

+sc_err_t sc_misc_seco_image_load(sc_ipc_t ipc, uint32_t addr_src,
+				 uint32_t addr_dst, uint32_t len, bool
fw) {
quoted
quoted
quoted
quoted
+	sc_rpc_msg_t msg;
+	uint8_t result;
+
+	RPC_VER(&msg) = SC_RPC_VERSION;
+	RPC_SVC(&msg) = (uint8_t)SC_RPC_SVC_MISC;
+	RPC_FUNC(&msg) =
(uint8_t)MISC_FUNC_SECO_IMAGE_LOAD;
quoted
quoted
quoted
quoted
+	RPC_U32(&msg, 0) = addr_src;
+	RPC_U32(&msg, 4) = addr_dst;
+	RPC_U32(&msg, 8) = len;
+	RPC_U8(&msg, 12) = (uint8_t)fw;
+	RPC_SIZE(&msg) = 5;
+
+	sc_call_rpc(ipc, &msg, false);
+
+	result = RPC_R8(&msg);
+	return (sc_err_t)result;
+}
+

So, the same code with mailbox framework will be some thing like this:
1. request all 4 channels in the probe. ignore completion
callback and set proper timeout.
That looks a bit strange. As I said in another email, MU physical
does not have multi Channels (here you mean are virtual channels).
And one whole MU instance is Exclusively used by SCU, why we need
abstract them into 4 channels to use separately with extra
unnecessary resource
hold and code path executed.

OK, so this MU should configured to use all 4 registers as one channel.
I have nothing against it to do in generic driver.
quoted
quoted
2. keep your old code by replacing this part.
	/* Write remaining words */
	while (count < msg->size) {
		mbox_send_message(sc_ipc->mbox_chan[count %
MU_TR_COUNT],
quoted
quoted
msg->DATA.u32[count - 1]);
		count++;
	}

The advantage of this variant. If SCU firmware will stall, the
linux will be able to notify about it without blocking complete system.
This part of code has been used for a long time and we've never
met the stall Issue which means SCU firmware guarantee it well.
But I agree a timeout mechanism Is better. However,  if only for
this reason, we can simply add a timeout mechanism in MU library
function as well, but still is far from a strong enough reason to
switch to a more
complexed mailbox one.

At this point a have absolutely zero tolerance. At the end I was one
on poor devs hunting similar kind of stalls.
Believe me, what can theoretically with 1% probability happen in the
lab, will end with days/months of bug hunting work and thousands of
field returns.
quoted
quoted
Yeah, so I think a timeout mechanism is worthy to add.
quoted
quoted
quoted
Can you please provide (if possible) your old mailbox based
implementation.
quoted
quoted
I'm curious to see why it is slow.
In our implementation:
1) One channel per MU
2) Tx using the same way to send msg as sc_ipc_write() in polling
mode
3) Rx enables the first word interrupt and polling for the rest of
them in a hrtimer.

The possible extra cost comparing to simple polling way:
1) Extra unnecessary code execution path of mailbox which is not
used by SCU MU
2) Interrupt latency
3) Extra memory copy handling RX message separately.
4) Extra schedule of hrtimer polling

Some of them probably could be optimized. However, besides the
slow problem, the extra unnecessary complexity and can't be
generic (specific to SCU) are also important ones.

And the MU general purpose interrupt feature and general purpose
flags feature may also not supported by mailbox well.
Ok, I give up.
Can we at least try to provide one device tree binding documentation
for that?
I will update the binding doc to use mailbox way as suggested by Rob.
But I still need to change mbox-cells to allow 0 which actually
already exist In kernel. So we may have too options for MU binding for
users.
quoted
For SCU MU, mbox-cells = <0>
For General MU, mbox-cells = <1>
ok. cool.
quoted
quoted
At the end, it is one and the same HW IP block spread in one SoC (or
different SoCs) an be used in different way. How exactly it will be
used is the choice of the developer/customer/user...
even if it will be described withing mailbox section dos not mean
you are forced to use mailbox framework.

Lets imagine worst case scenario and all of MU on i.MX8 are used by
different drivers. In my mailbox binding I suggest to use
<vendor>,<soc>- mu-<mu side>. On i.MX7 there is only one MU, so it
looks like "fsl,imx7s-mu- a" and "fsl,imx7s-mu-b". i.MX8 has multiple MU
so it should be enumerated.
quoted
quoted
Probably it will look like: "fsl,imx8x-mu0-a".
Do it make sense for you? Can we find and agreement here? :)
IMHO one possible concern may be that results in creating two much
compatibles strings. For example, each SoC have two (fsl, <soc>-mu-<a or
b>).
quoted
Now with MU supported SoCs, we have mx6sx, mx7ulp, Mx7d, mx8qm,
mx8x,
quoted
mx8mq and etc...

My question is that if it's really worthy to create a separate side b
compatible string instread of distinguish it by a property?

And for general purpose MUs to communicate with M4. Both side can be
used by either A core and M core. If we're doing this way, once users
wants A core to access B side, users need to change the compatible
string, that seems more like an unusual way compared to changing
properties.
quoted
How do you think?
My current use case is Linux with DT running on all CPUs of one SoC. On
i.MX7 there is only one MU and I use same mailbox driver on both sides. In
this case it is enough to have same compatible with extra parameter to see
the difference between A and B side.
Just imagine, you'll use DT in your firmware on i.MX8. Will you use same
driver on both sides of MU? If yes, I'm OK to exclude A and B from
compatible.
NXP release only provides a FreeRTOS based image running on M4 side.
Don't support Linux on M4. So we normally won't run the same driver
for both sides of MU. But for Linux community, it's good idea if Linux
can also run on M4 and using the same MU driver.

Regards
Dong Aisheng
--
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help