Thread (60 messages) 60 messages, 14 authors, 2008-02-02

Re: Problem with ata layer in 2.6.24

From: Gene Heskett <hidden>
Date: 2008-01-28 17:00:47
Also in: lkml

Possibly related (same subject, not in this thread)

On Monday 28 January 2008, Gene Heskett wrote:
While reading this msg as it came back, I locked up again and rebooted to 
2.6.24, and got lucky (maybe) as the attached dmesg will show quite a few 
instances of this LOOOONNNGG before the nvidia driver is loaded to taint the 
kernel.  Have fun guys!
 
On Monday 28 January 2008, Mikael Pettersson wrote:
quoted
Gene Heskett writes:
quoted
On Monday 28 January 2008, Peter Zijlstra wrote:
quoted
On Mon, 2008-01-28 at 09:17 +0100, Mikael Pettersson wrote:
quoted
1. Wrong mailing list; use linux-ide (@vger) instead.
What, and keep all us other interested people in the dark?
As a test, I tried rebooting to the latest fedora kernel and found it
kills X, so I'm back to the second to last fedora version ATM, and the
third 'smartctl -t lng /dev/sda' in 24 hours is running now.  The first
two completed with no errors.

I've added the linux-ide list to refresh those people of the problem,
the logs are being spammed by this message stanza:

 Jan 28 04:46:25 coyote kernel: [26550.290016] ata1.00: exception Emask
0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 28 04:46:25 coyote kernel:
[26550.290028] ata1.00: cmd 35/00:58:c9:9c:0a/00:01:00:00:00/e0 tag 0
dma 176128 out Jan 28 04:46:25 coyote kernel: [26550.290029]         
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 28
04:46:25 coyote kernel: [26550.290032] ata1.00: status: { DRDY } Jan 28
04:46:25 coyote kernel: [26550.290060] ata1: soft resetting link Jan 28
04:46:25 coyote kernel: [26550.452301] ata1.00: configured for UDMA/100
Jan 28 04:46:25 coyote kernel: [26550.452318] ata1: EH complete
Jan 28 04:46:25 coyote kernel: [26550.455898] sd 0:0:0:0: [sda]
390721968 512-byte hardware sectors (200050 MB) Jan 28 04:46:25 coyote
kernel: [26550.456151] sd 0:0:0:0: [sda] Write Protect is off Jan 28
04:46:25 coyote kernel: [26550.456403] sd 0:0:0:0: [sda] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
It's not obvious from this incomplete dmesg log what HW or driver
is behind ata1, but if the 2.6.24-rc7 kernel matches the 2.6.24 one,

it should be pata_amd driving a WDC disk:
quoted
[   30.702887] pata_amd 0000:00:09.0: version 0.3.10
[   30.703052] PCI: Setting latency timer of device 0000:00:09.0 to 64
[   30.703188] scsi0 : pata_amd
[   30.709313] scsi1 : pata_amd
[   30.710076] ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xf000
irq 14 [   30.710079] ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma
0xf008 irq 15 [   30.864753] ata1.00: ATA-6: WDC WD2000JB-00EVA0,
15.05R15, max UDMA/100 [   30.864756] ata1.00: 390721968 sectors, multi
16: LBA48
[   30.871629] ata1.00: configured for UDMA/100
Unfortunately we also see:
quoted
[   48.285456] nvidia: module license 'NVIDIA' taints kernel.
[   48.549725] ACPI: PCI Interrupt 0000:02:00.0[A] -> Link [APC4] -> GSI
19 (level, high) -> IRQ 20 [   48.550149] NVRM: loading NVIDIA UNIX x86
Kernel Module  169.07  Thu Dec 13 18:42:56 PST 2007
We have no way of debugging that module, so please try 2.6.24 without it.
Sorry, I can't do this and have a working machine.  The nv driver has
suffered bit rot or something since the FC2 days when it COULD run a 19"
crt at 1600x1200, and will not drive this 20" wide screen lcd 1680x1050
monitor at more than 800x600, which is absolutely butt ugly fuzzy, looking
like a jpg compressed to 10%.  The system is not usable on a day to basis
without the nvidia driver.

Fix the nv driver so it will run this screen at its native resolution and
I'll be glad to run it even if it won't run google earth, which I do use
from time to time.  Now, if in all the hits you can get from google on
this, currently 14,800 just for 'exception Emask', apparently caused by a
timeout, if 100% of the complainers are running nvidia drivers also, then I
see a legit complaint.  Again, fix the nv driver so it will run my screen &
I'll be glad to switch.  I can see the reason, sure, but the machine must
be capable of doing its common day to day stuff, while using that driver,
like running kde for kmail, and browsers that work.
quoted
If the problems persist, please try to capture a complete log from the
failing kernel -- the interesting bits are everything from initial boot
up to and including the first few errors. You may need to increase the
kernel's log buffer size if the log gets truncated (CONFIG_LOG_BUF_SHIFT).
If by log you mean /var/log/messages, I have several megabytes of those.
If you mean a live dmesg capture taken right now, its attached. It contains
several of these at the bottom.  I long ago made the kernel log buffer
bigger, cuz it couldn't even show the start immediately after the boot, and
even the dump to syslog was truncated.
quoted
There are no pata_amd changes from 2.6.24-rc7 to 2.6.24 final.
That is what I was afraid of.  I've done some limited grepping in that
branch of the kernel tree, and cannot seem to locate where this EH handler
is being invoked from.

There is 2 lines of interest in the dmesg:

[    0.000000] Nvidia board detected. Ignoring ACPI timer override.
[    0.000000] If you got timer trouble try acpi_use_timer_override

But I have NDI what it means, kernel argument/xconfig option?

I've also done some googling, and it appears this problem is fairly
widespread since the switchover to libata was encouraged.  A stock fedora
F8 kernel suffers the same freezes and eventually locks up, but does it
without the error messages being logged, it just freezes, feeling identical
to this in the minutes before the total freeze.  I've tried 2 of those too,
but the newest one won't even run X.


-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Deprive a mirror of its silver and even the Czar won't see his face.

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help