Re: [PATCH v2] md: do not _put wrong device in md_seq_next
From: Yu Kuai <hidden>
Date: 2023-09-12 13:25:37
Hi, 在 2023/09/12 21:01, Mariusz Tkaczyk 写道:
During working on changes proposed by Kuai [1], I determined that mddev->active is continusly decremented for array marked by MD_CLOSING. It brought me to md_seq_next() changed by [2]. I determined the regression here, if mddev_get() fails we updated mddev pointer and as a result we _put failed device.
This mddev is decremented while there is another mddev increased, that's why AceLan said that single array can't reporduce the problem. And because mddev->active is leaked, then del_gendisk() will never be called for the mddev while closing the array, that's why user will always see this array, cause infiniate loop open -> stop array -> close for systemd-shutdown.
I isolated the change in md_seq_next() and tested it- issue is no longer reproducible but I don't see the root cause in this scenario. The bug is obvious so I proceed with fixing. I will submit MD_CLOSING patches separatelly. Put the device which has been _get with previous md_seq_next() call. Add guard for inproper mddev_put usage(). It shouldn't be called if there are less than 1 for mddev->active. I didn't convert atomic_t to refcount_t because refcount warns when 0 is achieved which is likely to happen for mddev->active.
LGTM, Reviewed-by: Yu Kuai <redacted>
quoted hunk ↗ jump to hunk
[1] https://lore.kernel.org/linux-raid/028a21df-4397-80aa-c2a5-7c754560f595@gmail.com/T/#m6a534677d9654a4236623661c10646d45419ee1b (local) [2] https://bugzilla.kernel.org/show_bug.cgi?id=217798 Fixes: 12a6caf27324 ("md: only delete entries from all_mddevs when the disk is freed") Cc: Yu Kuai <redacted> Cc: AceLan Kao <redacted> Signed-off-by: Mariusz Tkaczyk <redacted> --- drivers/md/md.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-)diff --git a/drivers/md/md.c b/drivers/md/md.c index 0fe7ab6e8ab9..bb654ff62765 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c@@ -618,6 +618,12 @@ static void mddev_delayed_delete(struct work_struct *ws); void mddev_put(struct mddev *mddev) { + /* Guard for ambiguous put. */ + if (unlikely(atomic_read(&mddev->active) < 1)) { + pr_warn("%s: active refcount is lower than 1\n", mdname(mddev)); + return; + } + if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock)) return; if (!mddev->raid_disks && list_empty(&mddev->disks) &&@@ -8227,19 +8233,16 @@ static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos) { struct list_head *tmp; struct mddev *next_mddev, *mddev = v; - struct mddev *to_put = NULL; ++*pos; - if (v == (void*)2) + if (v == (void *)2) return NULL; spin_lock(&all_mddevs_lock); - if (v == (void*)1) { + if (v == (void *)1) tmp = all_mddevs.next; - } else { - to_put = mddev; + else tmp = mddev->all_mddevs.next; - } for (;;) { if (tmp == &all_mddevs) {@@ -8250,12 +8253,11 @@ static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos) next_mddev = list_entry(tmp, struct mddev, all_mddevs); if (mddev_get(next_mddev)) break; - mddev = next_mddev; - tmp = mddev->all_mddevs.next; + tmp = next_mddev->all_mddevs.next; } spin_unlock(&all_mddevs_lock); - if (to_put) + if (v != (void *)1) mddev_put(mddev); return next_mddev;