Re: [GIT PATCH 0/2] external-metadata recovery checkpointing for 2.6.33
From: Dan Williams <hidden>
Date: 2009-12-15 00:37:58
Subsystem:
software raid (multiple disks) support, the rest · Maintainers:
Song Liu, Yu Kuai, Linus Torvalds
On Sun, 2009-12-13 at 21:07 -0700, Neil Brown wrote:
+static ssize_t recovery_start_store(mdk_rdev_t *rdev, const char *buf, size_t len)
+{
+ unsigned long long recovery_start;
+
+ if (cmd_match(buf, "none"))
+ recovery_start = MaxSector;
+ else if (strict_strtoull(buf, 10, &recovery_start))
+ return -EINVAL;
+
+ if (rdev->mddev->pers &&
+ rdev->raid_disk >= 0)
+ return -EBUSY;Ok, I had a chance to test this out and have a question about how you envisioned mdmon handling this restriction which is a bit tighter than what I had before. The prior version allowed updates as long as the array was read-only. This version forces recovery_start to be written at sysfs_add_disk() time (before 'slot' is written). The conceptual problem I ran into was a race between ->activate_spare() determining the last valid checkpoint and the monitor thread starting up the array: ->activate_spare(): read recovery checkpoint ( array becomes read/write ) ( array becomes dirty, checkpoint invalidated ) sysfs_add_disk(): write invalid recovery checkpoint ( recovery starts from the wrong location ) The scheme I came up with was to not touch recovery_start in the manager thread and let the monitor thread have the last word on the recovery checkpoint. It would only write to md/rdX/recovery_start at the initial readonly->active transition, otherwise recovery starts from default-0. Is the patch below off base?
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 1cc5f2d..bd24e20 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c@@ -2467,7 +2467,8 @@ static ssize_t recovery_start_store(mdk_rdev_t *rdev, const char *buf, size_t le else if (strict_strtoull(buf, 10, &recovery_start)) return -EINVAL; - if (rdev->mddev->pers && + if (mddev->ro != 1 && + rdev->mddev->pers && rdev->raid_disk >= 0) return -EBUSY;