Thread (15 messages) 15 messages, 5 authors, 2016-03-25

Re: Nonterministic hang during bootconsole/console handover on ath79

From: Matthias Schiffer <hidden>
Date: 2016-03-22 13:08:07
Also in: linux-mips, lkml

quoted
My theory is the following:

As soon as ttyS0 is detected and installed as the console, there are two
console drivers active on the serial port at the same time: early0 and
ttyS0. I suspect that the hang occurs when the primitive early0
implementation prom_putchar_ar71xx waits indefinitely on THRE, but the real
driver has just reset the serial controller in a way that makes THRE never
come.
Doubtful.

console writes are performed with ints disabled, as is the 8250 driver's
autoconfig probing. Since this is a UP platform, as long as you're not
using the DEBUG_AUTOCONF switch in the 8250 driver, I don't think there's
a way for the boot console to be outputting while the 8250 driver is
configuring.
I see.
quoted
When the boot is successful, I also sometimes see just garbage
instead of the message "serial8250.0: ttyS0 at MMIO 0x18020000...", which
supports my idea that the kernel is trying to use the serial console while
it is not correctly setup.
quoted hunk ↗ jump to hunk
I wonder if autoconfig probing (that's what discovers the uart port type)
is broken.

You could test this hypothesis by setting the port type directly and
set UPF_FIXED_TYPE; ie., in arch/mips/ath79/dev-common.c
diff --git a/arch/mips/ath79/dev-common.c b/arch/mips/ath79/dev-common.c
index 516225d..3814a42 100644
--- a/arch/mips/ath79/dev-common.c
+++ b/arch/mips/ath79/dev-common.c
@@ -36,7 +36,8 @@ static struct plat_serial8250_port ath79_uart_data[] = {
 	{
 		.mapbase	= AR71XX_UART_BASE,
 		.irq		= ATH79_MISC_IRQ(3),
-		.flags		= AR71XX_UART_FLAGS,
+		.flags		= AR71XX_UART_FLAGS | UPF_FIXED_TYPE,
+		.type		= PORT_16550A,
 		.iotype		= UPIO_MEM32,
 		.regshift	= 2,
 	}, {

Regards,
Peter Hurley
I've tried your patch and I can't reproduce the issue anymore with it; I
have no idea if this actually has to do something with the issue, or the
change of the code path just hid the bug again.

Regarding your other mail: with "small change", I was not talking about
adding an additional printk; as mentioned, even changing the numbers in
UTS_VERSION can hide the issue. I diffed a working and a broken kernel
image, and the UTS_VERSION is really the only difference. I have no idea
how to explain this. (OpenWrt uses an LZMA-compressed kernel image, so
after compression, the differences are much greater; but how these
differences would affect the kernel after decompression eludes me)

I'll continue searching for a board with accessible JTAG which exhibits
this issue. Given the heisenbuggy nature of the issue, getting to the root
cause is probably impossible without JTAG unless someone has an obvious
explanation...

Thanks,
Matthias

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help