Thread (11 messages) 11 messages, 4 authors, 2013-02-28

Re: Persistent failures with simple md setup

From: Hans-Peter Jansen <hidden>
Date: 2013-02-28 10:49:53

Am Donnerstag, 28. Februar 2013, 14:40:13 schrieb NeilBrown:
On Wed, 27 Feb 2013 18:01:29 +0100 Hans-Peter Jansen [off-list ref] wrote:
quoted
[I intentionally switched off line wrapping, since it renders the given information
 inedible]

Hi Neil,

sorry for the delay. Meanwhile, I updated to mdadm 3.2.6, and recreated the initrd.
Since I do the recreation on a regular base, no data salad appeared, but the initial
issue persists. I trigger checking the related md on each manual intervention.

Hence approaching this again. 

That's a typical boot sequence:
BTW, you aren't the only one to hit this issue .... I think I'm starting to
get closer to the problem.
Nice to hear, will do my best to assist.
quoted
Feb 27 08:35:48 zaphkiel kernel: [    0.000000] Kernel command line: PROFILE=ibk root=/dev/disk/by-uuid/782b24f2-26f6-4ea9-ab8b-d377cd2ac97b console=ttyS0,19200 console=tty0 resume=/dev/md127 splash=silent  console=ttyS0,115200n8 vga=791
Feb 27 08:35:48 zaphkiel kernel: [    6.816164] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Feb 27 08:35:48 zaphkiel kernel: [    6.862240] sd 1:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Feb 27 08:35:48 zaphkiel kernel: [    6.983157] sd 0:0:0:0: [sda] Write Protect is off
Feb 27 08:35:48 zaphkiel kernel: [    7.011914] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Feb 27 08:35:48 zaphkiel kernel: [    7.011932] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb 27 08:35:48 zaphkiel kernel: [    7.159045] sd 1:0:0:0: [sdb] Write Protect is off
Feb 27 08:35:48 zaphkiel kernel: [    7.187766] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
Feb 27 08:35:48 zaphkiel kernel: [    7.187783] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb 27 08:35:48 zaphkiel kernel: [    7.319637]  sda: sda1 sda2 sda3 sda4
Feb 27 08:35:48 zaphkiel kernel: [    7.386469] sd 0:0:0:0: [sda] Attached SCSI disk
Feb 27 08:35:48 zaphkiel kernel: [    7.414335]  sdb: sdb1 sdb2 sdb3 sdb4
Feb 27 08:35:48 zaphkiel kernel: [    7.470597] sd 1:0:0:0: [sdb] Attached SCSI disk
Feb 27 08:35:48 zaphkiel kernel: [    9.676078] PM: Checking hibernation image partition /dev/md127
Feb 27 08:35:48 zaphkiel kernel: [    9.956010] sd 6:0:0:0: [sdc] Attached SCSI removable disk
Feb 27 08:35:48 zaphkiel kernel: [   10.175875] sd 6:0:0:1: [sdd] Attached SCSI removable disk
Feb 27 08:35:48 zaphkiel kernel: [   10.262999] sd 6:0:0:2: [sde] Attached SCSI removable disk
Feb 27 08:35:48 zaphkiel kernel: [   10.299000] sd 6:0:0:3: [sdf] Attached SCSI removable disk
Feb 27 08:35:48 zaphkiel kernel: [   10.618022] md: raid1 personality registered for level 1
Feb 27 08:35:48 zaphkiel kernel: [   11.139833] md: bind<sda3>
Feb 27 08:35:48 zaphkiel kernel: [   11.158438] md: bind<sdb3>
Feb 27 08:35:48 zaphkiel kernel: [   11.200897] md/raid1:md124: active with 2 out of 2 mirrors
Presumably this is md124 being assembled in the initrd as it contains the
root filesystem.
Correct.
quoted
Feb 27 08:35:48 zaphkiel kernel: [   11.233921] created bitmap (200 pages) for device md124
Feb 27 08:35:48 zaphkiel kernel: [   11.276742] md124: bitmap initialized from disk: read 13/13 pages, set 0 of 409595 bits
Feb 27 08:35:48 zaphkiel kernel: [   11.356796] md124: detected capacity change from 0 to 107372728320
Feb 27 08:35:48 zaphkiel kernel: [   11.395610]  md124: unknown partition table
Feb 27 08:35:48 zaphkiel kernel: [   11.543579] md: raid0 personality registered for level 0
Feb 27 08:35:48 zaphkiel kernel: [   11.583994] md: raid10 personality registered for level 10
Feb 27 08:35:48 zaphkiel kernel: [   12.188576] md: raid6 personality registered for level 6
Feb 27 08:35:48 zaphkiel kernel: [   12.220473] md: raid5 personality registered for level 5
Feb 27 08:35:48 zaphkiel kernel: [   12.252320] md: raid4 personality registered for level 4
Feb 27 08:35:48 zaphkiel kernel: [   12.318865] XFS (md124): Mounting Filesystem
Feb 27 08:35:48 zaphkiel kernel: [   12.535666] XFS (md124): Ending clean mount
And now XFS has mounted that, so presumably about there the boot sequence
switches from initrd to real-root.  You have obviously removed some lines,
presumably because you thought brevity would help.  For future reference: it
doesn't.  I much prefer to see complete logs so I can correlate various parts
of the sequence and be sure I understand what I'm seeing.  But I probably
have enough here.
Sorry, I grepped for 'md|sd' and removed some lines triggered from pata_amd, 
which doesn't have any business in this scenario anyway (all sata). 

Will remove this dangling driver from initrd later.

We're leaving initrd here.
quoted
Feb 27 08:35:48 zaphkiel kernel: [   22.720866] sd 0:0:0:0: Attached scsi generic sg0 type 0
Feb 27 08:35:48 zaphkiel kernel: [   23.126126] sd 1:0:0:0: Attached scsi generic sg1 type 0
Feb 27 08:35:48 zaphkiel kernel: [   23.203876] sd 6:0:0:0: Attached scsi generic sg2 type 0
Feb 27 08:35:48 zaphkiel kernel: [   23.402184] sd 6:0:0:1: Attached scsi generic sg3 type 0
Feb 27 08:35:48 zaphkiel kernel: [   23.725330] sd 6:0:0:2: Attached scsi generic sg4 type 0
Feb 27 08:35:48 zaphkiel kernel: [   23.917899] sd 6:0:0:3: Attached scsi generic sg5 type 0
Feb 27 08:35:48 zaphkiel kernel: [   24.492587] md: bind<sda2>
Feb 27 08:35:48 zaphkiel kernel: [   24.711804] md: bind<sda1>
Feb 27 08:35:48 zaphkiel kernel: [   24.821071] md: bind<sda4>
These are parts of 3 different arrays that are starting to be assembled in
parallel.  That must be udev running "mdadm -I /dev/sdaX" which will partly
assemble each array.  when the 'sdb' bits come along "mdadm -I /dev/sdbX"
should complete the assembly.
Yes, exactly.
quoted
Feb 27 08:35:48 zaphkiel kernel: [   25.607192] md: bind<sdb2>
Feb 27 08:35:48 zaphkiel kernel: [   25.672335] md/raid1:md1: active with 2 out of 2 mirrors
Feb 27 08:35:48 zaphkiel kernel: [   25.723678] created bitmap (8 pages) for device md1
Feb 27 08:35:48 zaphkiel kernel: [   25.771764] md1: bitmap initialized from disk: read 1/1 pages, set 0 of 16379 bits
And there is the first one:  mdadm -I /dev/sdb2 must have been run by udev so
md1 got fully assembled.

quoted
Feb 27 08:35:48 zaphkiel kernel: [   25.854963] md/raid1:md0: active with 1 out of 2 mirrors
But now md0 got assembled with waiting for sdb1.  This must have been the
   mdadm -IRs
in /etc/init.d/boot.md.  The purpose of this is to find any partially
assembled arrays and complete the assembly.  This is needed if, for example,
sdb has failed complete and would never be found.  However it shouldn't
happen yet.  It should wait until all the devices that have been discovered
have been processed.
A little early in the boot.md file you will see:
        if [ "$MDADM_DEVICE_TIMEOUT" -gt 0 ] ; then
            /sbin/udevadm settle --timeout="$MDADM_DEVICE_TIMEOUT"
        fi

so unless you have MDADM_DEVICE_TIMEOUT=0 in /etc/sysconfig/mdadm (which I
very much doubt) it should have already waited.  But there is no sign that it
did.  So maybe "udevadm settle" is broken?
MDADM_DEVICE_TIMEOUT="60"

As far as I can see, udevadm settle doesn't pick up all devices under some 
circumstances. The man page says, it waits up to --timeout seconds for the
event queue to become empty. Obviously, the event queue must be incomplete
then, given, that one md is assembled already in initrd (12 seconds ago),
and all other parts physically exist since then.

Probably, the events got eaten already in initrd already, given that timing.
quoted
Feb 27 08:35:48 zaphkiel kernel: [   25.886903] md1: detected capacity change from 0 to 2146783232
Feb 27 08:35:48 zaphkiel kernel: [   25.929142] created bitmap (6 pages) for device md0
Feb 27 08:35:48 zaphkiel kernel: [   25.958684]  md1: unknown partition table
Feb 27 08:35:48 zaphkiel kernel: [   25.995737] md: bind<sdb4>
Feb 27 08:35:48 zaphkiel kernel: [   26.012500] md0: bitmap initialized from disk: read 1/1 pages, set 0 of 12047 bits
Feb 27 08:35:48 zaphkiel kernel: [   26.077270] md0: detected capacity change from 0 to 98689024
Feb 27 08:35:48 zaphkiel kernel: [   26.168097] md/raid1:md3: active with 2 out of 2 mirrors
And here comes sdb4, so "mdadm -I /dev/sdb4" must have been run by udev.  And
it got in before the  "mdadm -IRs" managed to find that md0 was incomplete,
and started it.

quoted
Feb 27 08:35:48 zaphkiel kernel: [   26.205258]  md0: unknown partition table
Feb 27 08:35:48 zaphkiel kernel: [   26.242295] created bitmap (415 pages) for device md3
Feb 27 08:35:48 zaphkiel kernel: [   26.304271] md3: bitmap initialized from disk: read 26/26 pages, set 1465 of 849320 bits
Feb 27 08:35:48 zaphkiel kernel: [   26.392578] md3: detected capacity change from 0 to 890575601664
Feb 27 08:35:48 zaphkiel kernel: [   26.575527]  md3: unknown partition table
Feb 27 08:35:48 zaphkiel kernel: [   26.763091] Adding 2096464k swap on /dev/md1.  Priority:0 extents:1 across:2096464k 
Feb 27 08:35:48 zaphkiel kernel: [   29.268318] XFS (md3): Mounting Filesystem
Feb 27 08:35:48 zaphkiel kernel: [   29.521844] EXT4-fs (md0): mounting ext2 file system using the ext4 subsystem
Feb 27 08:35:48 zaphkiel kernel: [   29.684176] EXT4-fs (md0): mounted filesystem without journal. Opts: acl,user_xattr
Feb 27 08:35:48 zaphkiel kernel: [   30.356255] XFS (md3): Ending clean mount

Today, /dev/sdb1 of /dev/md0 vanished. Re-adding and checking doesn't revealed 
any error, but it would be cool to get to the bottom of this, as re-adding some
lost partitions to the md every second day *is* a real PITA.

What could I do to help resolving this issue? 
It really feels like a udev bug, but it is hard to be sure.
Could you edit /etc/init.d/boot.md and add
   echo BEFORE mdadm -IRs > /dev/kmsg
just before the call the "$mdadm_BIN -IRs",
   echo AFTER mdadm -IRs > /dev/kmsg
just after that call, and
   echo AFTER mdadm -Asc > /dev/mksg
just after the call to "$mdadm_BIN -A -s -c $mdadm_CONFIG"

And maybe put similar messages just before and after the  
    /sbin/udevadm settle ....
call.

If you can then reproduce the problem, the extra logging might tell us
something useful.
I've enabled initrd (linuxrc=trace) and boot.md (#!/bin/sh -x) debugging, hence 
we should see the full story next time.

Thanks,
Pete
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help