Thread (74 messages) 74 messages, 10 authors, 2015-05-12

[PATCH 03/16] tty: serial: 8250_core: read only RX if there is something in the FIFO

From: tony@atomide.com (Tony Lindgren)
Date: 2015-02-11 21:08:49
Also in: linux-omap, linux-serial, lkml

* Peter Hurley [off-list ref] [150211 12:05]:
On 02/10/2015 12:46 PM, Peter Hurley wrote:
quoted
On 02/10/2015 07:04 AM, Nicolas Schichan wrote:
quoted
On 02/10/2015 12:34 AM, Peter Hurley wrote:
quoted
Hi Nicolas,

Thanks for the report.
[...]
quoted
quoted
When a caracter is received on the UART while the kernel is printing
the boot messages, as soon as the kernel configures the UART for
receiving (after root filesystem mount), it gets stuck printing the
following message repeatedly:

serial8250: too much work for irq29

Once stuck, the reception of another character allows the boot process
to finish.

From what I can gather, when we hit that, the UART_IIR_NO_INT is 0 (so the
interrupt is raised), but the UART_LSR_DR bit is 0 as well so the UART_RX
register is never read to clear the interrupt.
The "too much work" message means serial8250_handle_irq() is returning 0,
ie., not handled. Which in turn means IIR indicates no interrupt is pending
(UART_IIR_NO_INT == 1).

Can you log the register values for LSR and IIR at both patch locations
in serial8250_do_startup()?

(I can get you a debug patch, if necessary. Let me know)
Hi Peter,

Thanks for your reply.

Here is what I have when the issue is triggered:

[   12.154877] lsr 0x60 / iir 0x01
[   12.158071] lsr 0x60 / iir 0x01
[   12.161438] serial8250: too much work for irq29
[   12.165982] lsr 0x60 / iir 0x0c
[   12.169354] serial8250: too much work for irq29
[   12.173900] lsr 0x60 / iir 0x0c
(previous two messages are repeated and printk_ratelimited())
Thanks for this information; I see I was wrong about the cause of message.

I think what happens during startup is that on this silicon clearing
the rx fifo (by serial8250_clear_fifos()) clears data ready but not
the rx timeout condition which causes a spurious rx interrupt when
interrupts are enabled.

So caught between two broken UARTs: one that underflows its rx fifo because
of unsolicited rx reads and the other that generates spurious interrupt
without unsolicited rx reads.

quoted
When the issue is not triggered:

[   10.784871] lsr 0x60 / iir 0x01
[   10.788066] lsr 0x60 / iir 0x01
[   10.794734] VFS: Mounted root (nfs filesystem) readonly on device 0:13.
[   10.801654] devtmpfs: mounted
[   10.805169] Freeing unused kernel memory: 184K (807be000 - 807ec000)
(userland takes over after that)

I have also displayed the IIR and LSR registers when the "too much fork for
IRQ" condition is triggered.

In the serial8250_do_startup(), before the interrupt are unmasked at the end,
the IIR looks sane and UART_IIR_NO_INT bit is set. When stuck
serial8250_interrupt(), UART_IIR_NO_INT is cleared and the interrupt ID is set
to 0xc which is not handled by the kernel at this time (the Kirkwood datasheet
 indicates that it is some kind of timeout condition from what I can gather).
Yes, IIR == UART_IIR_RX_TIMEOUT is to used indicate that data is in the rx fifo
but has not reached the rx trigger level yet.

ATM, I'm not exactly sure if there is a safe way to clear the spurious interrupt
from the interrupt handler.

I'm fairly certain the only way to clear the rx timeout interrupt is to read
the rx fifo, but I think this would race with actual data arrival. IOW, there
might not be a way to determine if the data read is spurious or not.
Yep, I see no safe way to clear the spurious interrupt [1] and no idea how to
keep it from happening (other than via the unsolicited RX reads in
serial8250_do_startup).

Unfortunately, I think this means we'll have to revert Sebastian's commit:

commit 0aa525d11859c1a4d5b78fdc704148e2ae03ae13
Author: Sebastian Andrzej Siewior [off-list ref]
Date:   Wed Sep 10 21:29:58 2014 +0200

    tty: serial: 8250_core: read only RX if there is something in the FIFO
    
which just means OMAP3630 will be limited to using the omap_serial driver.
Reverting makes sense to me if it has caused a regression. Maybe Sebastian
can update his patch to do this based on some quirk flag instead?

Regards,

Tony

 
[1]  To clear the RX timeout interrupt requires reading the rx fifo even though
LSR[data ready] indicates no data. However, this could result in dropped data
if the data became available just before clearing the RX timeout. For example,

 CPU                                 | Device
                                     |
 irq handler (simplified)            |
                                     |
   read IIR                          |
   is interrupt? yes                 |
   read LSR                          |
   is data ready? no                 |
   is IIR == Rx timeout? yes         | new data arrives
                                     | rx_fifo[0] = new data
                                     | lsr[data ready] = 1
   read RX and discard               |
                                     |
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help