Re: [PATCH -next v2 2/6] ext4: introduce last_check_time record previous check time
From: Jan Kara <jack@suse.cz>
Date: 2021-10-13 09:38:52
Also in:
lkml
On Tue 12-10-21 19:46:24, yebin wrote:
On 2021/10/12 16:47, Jan Kara wrote:quoted
On Fri 08-10-21 10:38:31, yebin wrote:quoted
On 2021/10/8 9:56, yebin wrote:quoted
On 2021/10/7 20:31, Jan Kara wrote:quoted
On Sat 11-09-21 17:00:55, Ye Bin wrote:quoted
kmmpd: ... diff = jiffies - last_update_time; if (diff > mmp_check_interval * HZ) { ... As "mmp_check_interval = 2 * mmp_update_interval", 'diff' always little than 'mmp_update_interval', so there will never trigger detection. Introduce last_check_time record previous check time. Signed-off-by: Ye Bin <redacted>I think the check is there only for the case where write_mmp_block() + sleep took longer than mmp_check_interval. I agree that should rarely happen but on a really busy system it is possible and in that case we would miss updating mmp block for too long and so another node could have started using the filesystem. I actually don't see a reason why kmmpd should be checking the block each mmp_check_interval as you do - mmp_check_interval is just for ext4_multi_mount_protect() to know how long it should wait before considering mmp block stale... Am I missing something? HonzaI'm sorry, I didn't understand the detection mechanism here before. Now I understand the detection mechanism here. As you said, it's just an abnormal protection. There's really no problem.Yeah, i did test as following steps hostA hostB mount ext4_multi_mount_protect -> seq == EXT4_MMP_SEQ_CLEAN delay 5s after label "skip" so hostB will see seq is EXT4_MMP_SEQ_CLEAN mount ext4_multi_mount_protect -> seq == EXT4_MMP_SEQ_CLEAN run kmmpd run kmmpd Actually,in this situation kmmpd will not detect confliction. In ext4_multi_mount_protect function we write mmp data first and wait 'wait_time * HZ' seconds, read mmp data do check. Most of the time, If 'wait_time' is zero, it can pass check.But how can be wait_time zero? As far as I'm reading the code, wait_time must be at least EXT4_MMP_MIN_CHECK_INTERVAL... Honzaint ext4_multi_mount_protect(struct super_block *sb, ext4_fsblk_t mmp_block) { struct ext4_super_block *es = EXT4_SB(sb)->s_es; struct buffer_head *bh = NULL; struct mmp_struct *mmp = NULL; u32 seq; unsigned int mmp_check_interval = le16_to_cpu(es->s_mmp_update_interval); unsigned int wait_time = 0; --> wait_time is equal with zero int retval; if (mmp_block < le32_to_cpu(es->s_first_data_block) || mmp_block >= ext4_blocks_count(es)) { ext4_warning(sb, "Invalid MMP block in superblock"); goto failed; } retval = read_mmp_block(sb, &bh, mmp_block); if (retval) goto failed; mmp = (struct mmp_struct *)(bh->b_data); if (mmp_check_interval < EXT4_MMP_MIN_CHECK_INTERVAL) mmp_check_interval = EXT4_MMP_MIN_CHECK_INTERVAL; /* * If check_interval in MMP block is larger, use that instead of * update_interval from the superblock. */ if (le16_to_cpu(mmp->mmp_check_interval) > mmp_check_interval) mmp_check_interval = le16_to_cpu(mmp->mmp_check_interval); seq = le32_to_cpu(mmp->mmp_seq); if (seq == EXT4_MMP_SEQ_CLEAN) --> If hostA and hostB mount the same block device at the same time, --> HostA and hostB maybe get 'seq' with the same value EXT4_MMP_SEQ_CLEAN. goto skip;
Oh, I see. Thanks for explanation.
...
skip:
/*
* write a new random sequence number.
*/
seq = mmp_new_seq();
mmp->mmp_seq = cpu_to_le32(seq);
retval = write_mmp_block(sb, bh);
if (retval)
goto failed;
/*
* wait for MMP interval and check mmp_seq.
*/
if (schedule_timeout_interruptible(HZ * wait_time) != 0) {
--> If seq is equal with EXT4_MMP_SEQ_CLEAN, wait_time is zero.
ext4_warning(sb, "MMP startup interrupted, failing mount");
goto failed;
}
retval = read_mmp_block(sb, &bh, mmp_block); -->We may get the same
data with which we wrote, so we can't detect conflict at here.OK, I see. So the race in ext4_multi_mount_protect() goes like: hostA hostB read_mmp_block() read_mmp_block() - sees EXT4_MMP_SEQ_CLEAN - sees EXT4_MMP_SEQ_CLEAN write_mmp_block() wait_time == 0 -> no wait read_mmp_block() - all OK, mount write_mmp_block() wait_time == 0 -> no wait read_mmp_block() - all OK, mount Do I get it right? Actually, if we passed seq we wrote in ext4_multi_mount_protect() to kmmpd (probably in sb), then kmmpd would notice the conflict on its first invocation but still that would be a bit late because there would be a time window where hostA and hostB would be both using the fs. We could reduce the likelyhood of this race by always waiting in ext4_multi_mount_protect() between write & read but I guess that is undesirable as it would slow down all clean mounts. Ted? Honza -- Jan Kara [off-list ref] SUSE Labs, CR