Thread (33 messages) 33 messages, 5 authors, 2012-03-29

Re: Fwd: libata-pmp patch for 3.2.x and later for eSATA Port Multiplier Sil3726

From: ANEZAKI, Akira <hidden>
Date: 2012-03-26 00:41:40
Also in: lkml

Hello Gwendal,

Thank you for your kindness response.

(2012/03/26 00:28), Gwendal Grignou wrote:
I reread your logs.

Assuming you don't mind long boot from cold power, the remaining
problem is with the 4 disk enclosures [ata7-ata10] on the second
machines where the first disk is not found and boot from warm reboot
is very long.
Not only first disk. 2 or more HDDs are missed for every PMP.
I try to understand why it works with the other 4 enclosures
[ata5-ata6] on the first and second machines.

Also, just to be sure I understand you configuration correctly, your
second machine has 30 disks total, not 40:
2 direct on ata1.00  and ata1.01
8 on 2 enclosures [ 2 * 4] on ata5 and ata6
20 on 4 enclosures [ 4 * 5] on ata7 - ata10
Oops! You are right. I'm very sorry!
Also, from the log, ata5 and ata6 is behind a Sil3132 based
controller, while ata7-ata10 behind a single Sil3124, not the opposite
as you said in a precedent mail.

If possible, could you switch 2 of the 4 enclosures [with their disks]
that fails to the port controlled by the Sil3132 controller, reboot
the machine with all its 30 drives and see if the failures follow the
controller or the enclosure.
Yes, I will do it as soon as possible. (Sorry, resyncing is runnning now.)
If you based your raid configuration on signature that should be fine,
but if it based on kernel device name [sdX] that will confuse md and
will mess with your data.
The problem is that some HDDs on every PMP from ata7 - ata10 are missed.
The RAID problem seems to be caused by it. mdadm.conf uses uuid. So I
think that kernel uses uuids of RAIDs.

Best Regards,
Akira
I am sorry I don't have any other suggestion right now,
The HDDs connected to ata7 -- ata10 are very old and support only Serial
ATA 1.0a. I checked data sheet and chip(JM20330) supports SRST command.
While booting, indicator LED brinks repeatedly. And more than half of
HDDs are identified. So, I have thought that the HDD side is not a
problem. How is your opinion about it?
Regards,
Gwendal.

On Sat, Mar 24, 2012 at 6:19 PM, ANEZAKI, Akira
[off-list ref] wrote:
quoted
Hello Gwendal,

I want to confirm one thing.
The kernel 3.1.x driver still works?

It seems to take long time to solve the problem. Of course I understand
staggered spin-up is better solution. But I can't wait it so long. And
it affects only SiI3726 only.

Best Regards,
Akira

(2012/03/23 18:59), ANEZAKI, Akira wrote:
quoted
Hello Gwendal,

(2012/03/23 17:31), Gwendal Grignou wrote:
quoted
quoted
quoted
quoted
quoted
I notice however some messages I did not see before:
quoted
quoted
[    4.856382] ata7.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
[    4.858742] ata7.00: hard resetting link
[   14.843039] ata7.00: softreset failed (timeout)
[   17.836402] ata7.15: qc timeout (cmd 0xe4)
The later indicates that the PMP is stuck and the host can not read
its internal register.
Is it possible that the PMP in these 4 enclosures you are using have a
different firmware than the other ones?
Firmware 1.0114 is available at:
http://www.siliconimage.com/support/searchresults.aspx?pid=26&cat=23

From the release notes:
"""- Fix SRST and initial two RegFIS Problem."""
I'm still fixing broken RAID. Sorry for my slow response.
I checked those firmware version. All of them use version 1.0114.

Best Regards,
Akira
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help