Re: Filesystem corruption MD (imsm) Raid0 via 2 SSD's + discard
From: Holger Kiehl <hidden>
Date: 2015-05-22 18:17:41
Also in:
lkml
On Thu, 21 May 2015, NeilBrown wrote:
On Thu, 21 May 2015 06:44:27 +0000 (UTC) Holger Kiehl [off-list ref] wrote:quoted
On Thu, 21 May 2015, NeilBrown wrote:quoted
On Thu, 21 May 2015 01:32:13 +0500 Roman Mamedov [off-list ref] wrote:quoted
On Wed, 20 May 2015 20:12:31 +0000 (UTC) Holger Kiehl [off-list ref] wrote:quoted
The kernel I was running when I discovered the problem was 4.0.2 from kernel.org. However, after reinstalling from DVD I updated to Fedora's lattest kernel, which was 3.19.? (I do not remember the last numbers). So that kernel seems also effected, but I assume it contains many 'fixes' from 4.0.x. As filesystem I use ext4, distribution is Fedora 21 and hardware is: Xeon E3-1275, 16GB ECC Ram. My system seems to be now running stable for some days with kernel.org kernel 4.0.3 and with discard DISABLED. But I am still unsure what could be the real cause.It is a bug in the 4.0.2 kernel, fixed in 4.0.3. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785672 https://bbs.archlinux.org/viewtopic.php?id=197400 https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable/+/d2dc317d564a46dfc683978a2e5a4f91434e9711I suspect that is a different bug. I think this one is https://bugzilla.kernel.org/show_bug.cgi?id=98501Should there not be a big fat warning going around telling users to disable discard on Raid 0 until this is fixed? This breaks the filesystem completely and I believe there is absolutly no way one can get back the data.Probably. Would you like to do that?quoted
Is this fixed in 4.0.4? And which kernels are effected? There could be many people running systems that have not noticed this and don't know in what dangerous situation they are when they delete data.The patch was only added to my tree today. I will send to Linus tomorrow so it should appear in the next -rc. Any -stable kernel released since mid-April probably has the bug. It was caused by commit 47d68979cc968535cb87f3e5f2e6a3533ea48fbd Once the fix gets into Linus' tree, it should get into subsequent -stable releases. The fix is here: http://git.neil.brown.name/?p=md.git;a=commitdiff;h=a81157768a00e8cf8a7b43b5ea5cac931262374f commit id should remain unchanged.
I would like to confirm that with this patch and discard enabled, I no longer see any corruption. Many thanks for the quick fix! Regards, Holger