RE: 3.12: raid-1 mismatch_cnt question
From: Justin Piszcz <hidden>
Date: 2013-11-14 17:22:26
-----Original Message----- From: joystick [mailto:joystick@shiftmail.org] Sent: Thursday, November 14, 2013 11:09 AM To: Justin Piszcz Cc: 'Bernd Schubert'; 'linux-raid' Subject: Re: 3.12: raid-1 mismatch_cnt question [ .. ]
quoted
At the end of the procedure (like now, if you didn't resync or repair in the meanwhile) is mismatch_cnt still so high?
After a reboot, I ran the check and yes it was still high. [ .. ]
quoted
no, not that one... it would be helpful to know the kernel version that *creates* mismatches, the one that you have running normally on the live system.
Version: 3.12.0 (and typically always use the latest) That's the "bugged" one, supposing this is really a bug (until we find where the mismatches are, it's difficult to say wether this is a data loss or not)
quoted
Maybe the mismatched are located ext4 metadata areas which are not files and so can't be seen with md5sums... That would still be as much worrisome, unless some expert of ext4 can tell that it's ok (it can be OK if the region with mismatches is an old metadata area, currently unused; the mechanism that can create harmless mismatches in this case has been described by Neil)
If that is what is occurring, is it possible to exclude them from mismatch_cnt?
[ .. ]
- First confirm that mismatch_cnt is still high..
It was 0 after reboot.
[ .. ]
- Then if this does not disrupt your system operation too much, i would
suggest to fill 95% of free space with a zeroes file like you did in
earlier tests. Otherwise for a mismatch happening in non-file area we
won't be sure of what kind of area is that. Maybe recompute mismatch_cnt
after this.
Create file up to 95% utilization on /root:
/dev/root 219G 205G 12G 95% /
Re-check:
# echo check > /sys/devices/virtual/block/md1/md/sync_action
# cat /sys/devices/virtual/block/md1/md/mismatch_cnt
27520
then, copypasting the procedure with some modifications:
----
... to determine the location of mismatches (...)
Unfortunately I don't think MD tells you the location of mismatches
directly. Do you want to try the following:
/sys/block/md1/md/sync_min and /sys/block/md1/md/sync_max should allow
you to narrow the region of the next check.
Set them, then perform check, then cat mismatch_cnt.
Narrow progressively sync_min and sync_max so that you identify the most
dense areas of mismatches, or a few single blocks that mismatch.
When you have identified some regions or isolated blocks, invoke "sync"
from bash and then check again the same region a couple of times so to
be sure that it stays mismatched and it's not just a transient situation.
Then try with debugfs (in readonly mode can be used with fs mounted):
there should be an option to get the inode number from a block number of
the device... I hope that block numbers are not offset by MD... I think
it's icheck and after that you might need "find -inum <inode_number>"
launched on the same filesystem to find the corresponding filename from
the inode number. That should be the file that contains the mismatch.
[ .. ]
When I do this, the speed of check thereafter is very slow:
Personalities : [raid1]
md1 : active raid1 sdc2[0] sdb2[1]
233381376 blocks [2/2] [UU]
[>....................] check = 0.0% (4500/233381376) finish=80387.9min speed=48K/sec (55 days)
The speed continues to decrease when the sync_min is set to 1000 and sync_max is 9000 (this won't work).
A few minutes later:
Personalities : [raid1]
md1 : active raid1 sdc2[0] sdb2[1]
233381376 blocks [2/2] [UU]
[>....................] check = 0.0% (4500/233381376) finish=200485.5min speed=19K/sec
It would be interesting if someone else on this list has ext4 and sees similar results (mismatch_cnt) with their SSDs vs. another FS (XFS/etc).
Justin.