Thread (44 messages) 44 messages, 5 authors, 2018-12-10

Re: [PATCH v2 01/10] mailbox: Support blocking transfers in atomic context

From: Thierry Reding <hidden>
Date: 2018-11-22 17:34:32
Also in: linux-arm-kernel, linux-serial, linux-tegra

On Thu, Nov 22, 2018 at 10:07:09AM -0600, Jassi Brar wrote:
Hi Thierry,

On Thu, Nov 22, 2018 at 2:47 AM Thierry Reding [off-list ref] wrote:
quoted
On Wed, Nov 21, 2018 at 08:18:22PM -0600, Jassi Brar wrote:
quoted
On Wed, Nov 21, 2018 at 8:27 AM Thierry Reding [off-list ref] wrote:
quoted
On Tue, Nov 20, 2018 at 04:29:07PM +0100, Thierry Reding wrote:
quoted
On Sat, Nov 17, 2018 at 11:27:17AM -0600, Jassi Brar wrote:
quoted
On Mon, Nov 12, 2018 at 9:18 AM Thierry Reding [off-list ref] wrote:
quoted
From: Thierry Reding <redacted>

The mailbox framework supports blocking transfers via completions for
clients that can sleep. In order to support blocking transfers in cases
where the transmission is not permitted to sleep, add a new ->flush()
callback that controller drivers can implement to busy loop until the
transmission has been completed. This will automatically be called when
available and interrupts are disabled for clients that request blocking
transfers.

Signed-off-by: Thierry Reding <redacted>
---
 drivers/mailbox/mailbox.c          | 8 ++++++++
 include/linux/mailbox_controller.h | 4 ++++
 2 files changed, 12 insertions(+)
diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 674b35f402f5..0eaf21259874 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -267,6 +267,14 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
                unsigned long wait;
                int ret;

+               if (irqs_disabled() && chan->mbox->ops->flush) {
+                       ret = chan->mbox->ops->flush(chan, chan->cl->tx_tout);
+                       if (ret < 0)
+                               tx_tick(chan, ret);
+
+                       return ret;
+               }
+
This is hacky. I think we can do without busy waiting in atomic
context. You could queue locally while in atomic context and then
transfer in blocking mode. I don't think we should worry about the
'throughput' as there already is no h/w rate control even with
busy-waiting.
I actually tried to do that before I added this flushing mechanism. The
problem is, like you said, one of rate control. As mentioned in the
cover letter, the shared mailboxes implemented in tegra-hsp are used as
RX and TX channels for the TCU, which is like a virtual UART. The TTY
driver included as part of this series will use one of the mailboxes to
transmit data that is written to the console. The problem is that if
these transmissions are not rate-limited on the TTY driver side, the
console will just keep writing data and eventually overflow the buffer
that we have in the mailbox subsystem.

The problem is that data comes in at a much higher rate than what we can
output. This is especially true at boot when the TCU console takes over
and the whole log buffer is dumped on it.

So the only way to rate-limit is to either make mbox_send_message()
block, but that can only be done in non-atomic context. The console,
however, will always run in atomic context, so the only way to do rate-
limiting is by busy looping.
What I also tried before was to implement busy looping within the
->send_data() callback of the driver so that we didn't have to put this
into the core. Unfortunately, however, the ->send_data() callback is
called under chan->lock, which means that from mbox_send_message() we
don't have a way to mark the transfer as done. In order to do that we'd
have to call mbox_chan_txdone(), but that ends up calling tx_tick() and
that in turn also attempts to take the chan->lock, which would cause a
deadlock.

The explicit flushing is the best alternative that I could come up with.
I think it's not all that hacky, because it's very explicit about what's
going on and it has the nice side-effect that it will allow the mailbox
to work in interrupt driven mode if possible and only resorting to the
busy loop in atomic context.

At this point I think I have explored all other options and I frankly
can't find a more proper way to achieve what we need here. Perhaps you
can think of additional ways to accomplish this?
Well, I would have a local ring buffer (array) of enough size to hold
the characters and then have a task consuming data from that ring
buffer by transmitting over mailbox.
There's already such a ringbuffer in the printk code. To implement what
you suggest would effectively be creating a copy of that buffer because
we'd be allocating the buffer and the console code would just dump each
and every character in the logbuf into that ring buffer without rate-
limitation.
Well, the console assumes there exists an atomic path to put character
on the bus, But because there isn't in case of tcu, we have to emulate
that. Frankly I prefer the one-off driver jump some hoops, rather than
implement exceptions in the api.
I wouldn't have any objections to that if the hoops were reasonable
ones. What you're asking me to do is basically implement a second copy
of the logbuf. I don't call that a reasonable hoop to jump through. It
is also not guaranteed to work properly because we can always end up
with a situation where we produce more data than we can consume. Also,
by providing this additional buffer we make things worse because the
standard mechanisms of the logbuf are side-stepped. Typically the logbuf
code will warn if it overflows. In our case we provide a buffer that the
console can dump into, so instead of the logbuf overflowing and warning
about it, we'd now overflow the mailbox buffer and we'd have to add
extra code to warn about overflows.

The console really only works because it assumes that the output driver
will stall and thereby rate-limit.
BTW, there is already no rate-limitation because its all virtual -
data is consumed as fast as possible.
No, that's not true. On the receiving end of the TX mailbox is a micro-
processor that reads out the mailbox data. Once it has read the data it
needs to clear the FULL bit so that the TCU driver can write more data.
Even if the microprocessor on the receiving end did buffering (there is
no indication that it does) it would eventually run out of buffer space
and have to stall until it's clocked all the characters out of the
physical UART. At that point no amount of buffering is going to save us
and the only option is to stall, which, again, can currently only be
done from the mailbox, because it is the only one that knows when it is
busy. But it's also the one place where we can't because the framework
doesn't allow it.
quoted
To make matters worse, the ringbuffer would be empty most of the time
after the initial dump of the logbuf, so we'd be wasting all that buffer
space.
The idea is console and uart-ops both feed into this buffer and the
only consumer thread runs the mailbox.
quoted
It just seems to me like we should be keeping the TCU driver as close as
possible to other UART drivers which also busy loop in order to rate-
limit what the console can write. Given the current mailbox framework it
is not possible to do that (in interrupt context), so an extension seems
like the most sensible option.

Perhaps you'd be less concerned about such a change if it was perhaps
more explicit? Just throwing ideas around, I think something that could
also work is if we explicitly add a mbox_flush() function that would
basically be calling ->flush(). That way users of the mailbox can make
their requirement very explicit. I haven't actually tested that, but I
think it would work. Does that sound more acceptable to you?
I am happy to see features and bugfixes added to the api. What I am
not eager about is supporting less than 100% legit and very platform
specific usecases, especially when there is a work around.
Look, I'd be willing to push this all into the driver, but the framework
doesn't allow me to do that. If it was possible to run the state machine
outside of mbox_send_message() then I'd be able to just move the busy
loop or flush into the driver. But the locking is such that I can't do
that because it will cause a deadlock.

The ringbuffer workaround would be very brittle and if at all only work
by accident, not to mention that it would require duplicating much of
the logbuf logic.

Also I don't consider usage in atomic context a very platform specific
use-case, and I don't understand why I should be required to resort to
some unreliable workaround rather than find a proper fix that guarantees
proper operation.

Thierry
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help