Re: Fwd: libata-pmp patch for 3.2.x and later for eSATA Port Multiplier Sil3726
From: ANEZAKI, Akira <hidden>
Date: 2012-03-26 00:41:40
Also in:
lkml
Hello Gwendal, Thank you for your kindness response. (2012/03/26 00:28), Gwendal Grignou wrote:
I reread your logs. Assuming you don't mind long boot from cold power, the remaining problem is with the 4 disk enclosures [ata7-ata10] on the second machines where the first disk is not found and boot from warm reboot is very long.
Not only first disk. 2 or more HDDs are missed for every PMP.
I try to understand why it works with the other 4 enclosures [ata5-ata6] on the first and second machines. Also, just to be sure I understand you configuration correctly, your second machine has 30 disks total, not 40: 2 direct on ata1.00 and ata1.01 8 on 2 enclosures [ 2 * 4] on ata5 and ata6 20 on 4 enclosures [ 4 * 5] on ata7 - ata10
Oops! You are right. I'm very sorry!
Also, from the log, ata5 and ata6 is behind a Sil3132 based controller, while ata7-ata10 behind a single Sil3124, not the opposite as you said in a precedent mail. If possible, could you switch 2 of the 4 enclosures [with their disks] that fails to the port controlled by the Sil3132 controller, reboot the machine with all its 30 drives and see if the failures follow the controller or the enclosure.
Yes, I will do it as soon as possible. (Sorry, resyncing is runnning now.)
If you based your raid configuration on signature that should be fine, but if it based on kernel device name [sdX] that will confuse md and will mess with your data.
The problem is that some HDDs on every PMP from ata7 - ata10 are missed. The RAID problem seems to be caused by it. mdadm.conf uses uuid. So I think that kernel uses uuids of RAIDs. Best Regards, Akira
I am sorry I don't have any other suggestion right now,
The HDDs connected to ata7 -- ata10 are very old and support only Serial ATA 1.0a. I checked data sheet and chip(JM20330) supports SRST command. While booting, indicator LED brinks repeatedly. And more than half of HDDs are identified. So, I have thought that the HDD side is not a problem. How is your opinion about it?
Regards, Gwendal. On Sat, Mar 24, 2012 at 6:19 PM, ANEZAKI, Akira [off-list ref] wrote:quoted
Hello Gwendal, I want to confirm one thing. The kernel 3.1.x driver still works? It seems to take long time to solve the problem. Of course I understand staggered spin-up is better solution. But I can't wait it so long. And it affects only SiI3726 only. Best Regards, Akira (2012/03/23 18:59), ANEZAKI, Akira wrote:quoted
Hello Gwendal, (2012/03/23 17:31), Gwendal Grignou wrote:quoted
quoted
quoted
quoted
quoted
I notice however some messages I did not see before:quoted
quoted
[ 4.856382] ata7.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9 [ 4.858742] ata7.00: hard resetting link [ 14.843039] ata7.00: softreset failed (timeout) [ 17.836402] ata7.15: qc timeout (cmd 0xe4)The later indicates that the PMP is stuck and the host can not read its internal register. Is it possible that the PMP in these 4 enclosures you are using have a different firmware than the other ones? Firmware 1.0114 is available at: http://www.siliconimage.com/support/searchresults.aspx?pid=26&cat=23 From the release notes: """- Fix SRST and initial two RegFIS Problem."""I'm still fixing broken RAID. Sorry for my slow response.I checked those firmware version. All of them use version 1.0114. Best Regards, Akira