Re: Assemblin journaled array fails
From: Song Liu <song@kernel.org>
Date: 2020-06-23 23:13:00
On Tue, Jun 23, 2020 at 6:17 AM Michal Soltys [off-list ref] wrote:
On 6/22/20 6:37 PM, Song Liu wrote:quoted
quoted
quoted
Thanks for the trace. Looks like we may have some issues with MD_SB_CHANGE_PENDING. Could you please try the attached patch?Should I run this along with pr_debugs from the previous patch enabled ?We don't need those pr_debug() here. Thanks, SongSo with this patch attached, there is no extra output whatsoever - once it finished getting past this point: [ +0.371752] r5c_recovery_rewrite_data_only_stripes rewritten 20001 stripes to the journal, current ctx->pos 408461384 ctx->seq 866603361 [ +0.395000] r5c_recovery_rewrite_data_only_stripes rewritten 21001 stripes to the journal, current ctx->pos 408479568 ctx->seq 866604361 [ +0.371255] r5c_recovery_rewrite_data_only_stripes rewritten 22001 stripes to the journal, current ctx->pos 408496600 ctx->seq 866605361 [ +0.401013] r5c_recovery_rewrite_data_only_stripes rewritten 23001 stripes to the journal, current ctx->pos 408515472 ctx->seq 866606361 [ +0.370543] r5c_recovery_rewrite_data_only_stripes rewritten 24001 stripes to the journal, current ctx->pos 408532112 ctx->seq 866607361 [ +0.319253] r5c_recovery_rewrite_data_only_stripes done [ +0.061560] r5c_recovery_flush_data_only_stripes enter [ +0.075697] r5c_recovery_flush_data_only_stripes before wait_event That is, besides 'task <....> blocked for' traces or unless pr_debug()s were enabled. There were a few 'md_write_start set MD_SB_CHANGE_PENDING' *before* that (all of them likely related to another raid that is active at the moment, as these were happening during that lengthy r5c_recovery_flush_log() process).
Hmm.. this is weird, as I think I marked every instance of set_bit MD_SB_CHANGE_PENDING. Would you mind confirm those are to the other array with something like:
diff --git i/drivers/md/md.c w/drivers/md/md.c
index dbbc8a50e2ed2..e91acfdcec032 100644
--- i/drivers/md/md.c
+++ w/drivers/md/md.c@@ -8480,7 +8480,7 @@ bool md_write_start(struct mddev *mddev, struct bio *bi) mddev->in_sync = 0; set_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags); set_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); - pr_info("%s set MD_SB_CHANGE_PENDING\n", __func__); + pr_info("%s: md: %s set
MD_SB_CHANGE_PENDING\n", __func__, mdname(mddev));
md_wakeup_thread(mddev->thread);
did_change = 1;
}
Thanks,
Song