Thread (2 messages) 2 messages, 2 authors, 2008-05-09

Re: [RFC][PATCH] md: avoid fullsync if a faulty member missed a dirty transition

From: Mike Snitzer <hidden>
Date: 2008-05-09 15:01:41
Also in: lkml

On Fri, May 9, 2008 at 2:01 AM, Neil Brown [off-list ref] wrote:
On Friday May 9, snitzer@gmail.com wrote:
 > Unfortunately my testing with this patch results in a full resync.
 >
 > Here is the state of the array after shutdown:
 > # mdadm -X /dev/nbd0 /dev/sdq
 >         Filename : /dev/nbd0
 >            Magic : 6d746962
 >          Version : 4
 >             UUID : 7140cc3c:8681416c:12c5668a:984ca55d
 >           Events : 896
 >   Events Cleared : 897

 Events Cleared is *larger* than Events!!! Is that repeatable?  I can
 only see it happening if a very small race were lost.  You don't have
 any other patches in there do you?
Yes, it is repeatable with your previous patch.  But with your most
recent patch I had the following after shutdown:

# mdadm -X /dev/nbd0 /dev/sdq
        Filename : /dev/nbd0
          Events : 1732
  Events Cleared : 1732
          Bitmap : 409600 bits (chunks), 1 dirty (0.0%)

        Filename : /dev/sdq
          Events : 1736
  Events Cleared : 1736
          Bitmap : 409600 bits (chunks), 1 dirty (0.0%)

Unfortunately sdq's events_cleared appears to have been updated
_after_ the array became degraded.
As such a full resync occurred because 1732 < 1736.
 This patch should close the race, though I still find it hard to
 believe that you lost the race.
Comments inlined below.
quoted hunk ↗ jump to hunk
 Signed-off-by: Neil Brown [off-list ref]

 ### Diffstat output
  ./drivers/md/bitmap.c |   20 +++++++++++++++-----
  1 file changed, 15 insertions(+), 5 deletions(-)


 diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c
 --- .prev/drivers/md/bitmap.c   2008-05-09 11:02:13.000000000 +1000
 +++ ./drivers/md/bitmap.c       2008-05-09 16:00:07.000000000 +1000
@@ -465,8 +465,6 @@ void bitmap_update_sb(struct bitmap *bit
        spin_unlock_irqrestore(&bitmap->lock, flags);
        sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0);
        sb->events = cpu_to_le64(bitmap->mddev->events);
 -       if (!bitmap->mddev->degraded)
 -               sb->events_cleared = cpu_to_le64(bitmap->mddev->events);
Before, events_cleared was _not_ updated if the array was degraded.
Your patch doesn't appear to maintain that design.

I tried adding the following degraded check to your below conditional
but that resulted in nbd0's events < mddev->bitmap->events_cleared
again, so I'm back to square one:

                        if (!bitmap->mddev->degraded &&
                            bitmap->events_cleared < bitmap->mddev->events) {

In addition no bits were set in sdq's bitmap:
# mdadm -X /dev/nbd0 /dev/sdq
        Filename : /dev/nbd0
          Events : 2616
  Events Cleared : 2617
          Bitmap : 409600 bits (chunks), 0 dirty (0.0%)

        Filename : /dev/sdq
          Events : 2618
  Events Cleared : 2617
          Bitmap : 409600 bits (chunks), 0 dirty (0.0%)
 @@ -1094,9 +1092,21 @@ void bitmap_daemon_work(struct bitmap *b

                        } else
                                spin_unlock_irqrestore(&bitmap->lock, flags);
                        lastpage = page;
 -/*
 -                       printk("bitmap clean at page %lu\n", j);
 -*/
 +
 +                       /* We are possibly going to clear some bits, so make
 +                        * sure that events_cleared is up-to-date.
 +                        */
 +                       if (bitmap->events_cleared < bitmap->mddev->events) {
 +                               bitmap_super_t *sb;
 +                               bitmap->events_cleared = bitmap->mddev->events;
 +                               wait_event(mddev->sb_wait,
 +                                   !test_bit(MD_CHANGE_CLEAN, &mddev->flags));
I needed "bitmap->mddev->sb_wait" and "bitmap->mddev->flags" to get
the code to compile.

Mike
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help