Thread (9 messages) 9 messages, 4 authors, 2012-08-03

Re: "Unknown code" error when enabling metadata_csum on ext4 raid1 device

From: Zheng Liu <hidden>
Date: 2012-08-02 09:49:39

On Wed, Aug 01, 2012 at 10:43:05PM -0500, Nick Semenkovich wrote:
[-- snip --]
Sorry for the slow reply --


I hadn't seen any "Corrupt dir inode" errors until now.

Before running the one-line patch above, I resynced the MD array and
ran a quick fsck (via "touch /forcefsck" & reboot).


Then,
$ sudo misc/tune2fs -O metadata_csum /dev/md1

[says something about running e2fsck -D]


Then I got a few dmesg errors like:

[128700.816091] JBD2: Spotted dirty metadata buffer (dev = md1,
blocknr = 5243385). There's a risk of filesystem corruption in case of
system crash.
[128700.816106] JBD2: Spotted dirty metadata buffer (dev = md1,
blocknr = 1057). There's a risk of filesystem corruption in case of
system crash.

then a lot of

[128711.000677] EXT4-fs warning (device md1): dx_probe:647: dx entry:
limit != root limit
[128711.000679] EXT4-fs warning (device md1): dx_probe:732: Corrupt
dir inode 7733251, running e2fsck is recommended.


On my next command (sudo -s), I got an immediate kernel panic:

[128713.776475] EXT4-fs warning (device md1): dx_probe:732: Corrupt
dir inode 7733251, running e2fsck is recommended.
[128761.137143] BUG: unable to handle kernel NULL pointer dereference
at           (null)
[128761.137195] IP: [<ffffffff8121d448>] ext4_iget+0x498/0xa50
[128761.137231] PGD 106651067 PUD 11cf41067 PMD 0
[128761.137258] Oops: 0000 [#1] SMP
[128761.137279] CPU 0
[snip...]

Full panic @ http://web.mit.edu/semenko/Public/panic.txt
Hi Nick,

Thanks for testing my patch.  As you described above, it seems that
there still has some bugs when metadata_csum feature enabled.  I tried
to reproduce this bug, but I couldn't reproduce it in my sandbox.  I see
the full panic file, and it seems that the kernel is running on Ubuntu
distribution and it doesn't use a generic mainline kernel.  So IMHO
would you like to try a latest upstream kernel?  At least when the
problem happens again, it is easy for me to find out where goes wrong.
Thanks for your patient.

Regards,
Zheng
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help