Thread (3 messages) 3 messages, 3 authors, 2008-05-02

Re: [PATCH] md: fix raid5 'repair' operations

From: Michael Tokarev <hidden>
Date: 2008-05-02 11:17:49

Neil Brown wrote:
On Thursday May 1, dan.j.williams@intel.com wrote:
quoted
commit bd2ab67030e9116f1e4aae1289220255412b37fd "md: close a livelock
window in handle_parity_checks5" introduced a bug in handling 'repair'
operations.  After a repair operation completes we clear the state bits
tracking this operation.  However, they are cleared too early and this
results in the code deciding to re-run the parity check operation.  Since
we have done the repair in memory the second check does not find a mismatch
and thus does not do a writeback.
yes....
I must admit that I find that code fairly hard to make sense of, but I
can see how it was failing before and how this fixes it, and testing
confirms that, so I suspect it is right.

I cannot help feeling that there must be some way to simplify all
those .pending and .complete bits and make it somewhat clearer, but I
haven't been able to figure out how :-(

So: Acked-by: NeilBrown [off-list ref]

I'm heading for a weekend, but feel free to send this to akpm.
Hmm.  Should this be sent to stable- as well?  I were just biten by
this very bug here, and after applying the patch and rebooting the
problem went away...  2.6.25.0 here.

/mjt
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help