Re: [PATCH] md: fix raid5 'repair' operations
From: Michael Tokarev <hidden>
Date: 2008-05-02 11:17:49
Neil Brown wrote:
On Thursday May 1, dan.j.williams@intel.com wrote:quoted
commit bd2ab67030e9116f1e4aae1289220255412b37fd "md: close a livelock window in handle_parity_checks5" introduced a bug in handling 'repair' operations. After a repair operation completes we clear the state bits tracking this operation. However, they are cleared too early and this results in the code deciding to re-run the parity check operation. Since we have done the repair in memory the second check does not find a mismatch and thus does not do a writeback.yes.... I must admit that I find that code fairly hard to make sense of, but I can see how it was failing before and how this fixes it, and testing confirms that, so I suspect it is right. I cannot help feeling that there must be some way to simplify all those .pending and .complete bits and make it somewhat clearer, but I haven't been able to figure out how :-( So: Acked-by: NeilBrown [off-list ref] I'm heading for a weekend, but feel free to send this to akpm.
Hmm. Should this be sent to stable- as well? I were just biten by this very bug here, and after applying the patch and rebooting the problem went away... 2.6.25.0 here. /mjt