Re: SATA150TX4 atat1:command timeout
From: Francois Payette <hidden>
Date: 2005-02-16 15:03:37
With plain vanilla 2.6.11-rc4 the same bug appears after about 250GB (avg of 2 trials). With the TBG clock setting line omitted it still happens, but after about 1 1 TB (avg of 2 trials, takes about 6hrs per trial). Interestingly enough, this change does not slow down the setup, it even seems a little faster. I was mistaken earlier: the 4 drives are not exactly the same, there is 2 6B200M0 one 6B200S0 and one 6Y200M0. This should be irrelevant as I have swapped disks and wires and the problem happens anyway. One interesting thing: in init 1 the timeout seems to appear faster, after about 200GB in the case with the omission. I would be inclined to think this is some sort of a deadlock or race condition: the kernel does not dump or panic, it just hangs on pdc_eng_timeout. When we dumped the stack in that function, all we had was pdc_eng_timeout, as there seems to a be a separate thread per disk that gets waken up for error handling. Any ideas on how we can catch this one? TIA, Francois