Re: [PATCH v5 2/8] btrfs: only let one thread pre-flush delayed refs in commit
From: Nikolay Borisov <hidden>
Date: 2021-01-12 09:19:09
On 11.01.21 г. 23:50 ч., David Sterba wrote:
On Mon, Jan 11, 2021 at 10:33:42AM +0200, Nikolay Borisov wrote:quoted
On 8.01.21 г. 18:01 ч., David Sterba wrote:quoted
On Fri, Dec 18, 2020 at 02:24:20PM -0500, Josef Bacik wrote:quoted
@@ -2043,23 +2043,22 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) btrfs_trans_release_metadata(trans); trans->block_rsv = NULL; - /* make a pass through all the delayed refs we have so far - * any runnings procs may add more while we are here - */ - ret = btrfs_run_delayed_refs(trans, 0); - if (ret) { - btrfs_end_transaction(trans); - return ret; - } - - cur_trans = trans->transaction; - /* - * set the flushing flag so procs in this transaction have to - * start sending their work down. + * We only want one transaction commit doing the flushing so we do not + * waste a bunch of time on lock contention on the extent root node. */ - cur_trans->delayed_refs.flushing = 1; - smp_wmb();This barrier obviously separates the flushing = 1 and the rest of the code, now implemented as test_and_set_bit, which implies full barrier. However, hunk in btrfs_should_end_transaction removes the barrier and I'm not sure whether this is correct: - smp_mb(); if (cur_trans->state >= TRANS_STATE_COMMIT_START || - cur_trans->delayed_refs.flushing) + test_bit(BTRFS_DELAYED_REFS_FLUSHING, + &cur_trans->delayed_refs.flags)) return true; This is never called under locks so we don't have complete synchronization of neither the transaction state nor the flushing bit. btrfs_should_end_transaction is merely a hint and not called in critical places so we could probably afford to keep it without a barrier, or keep it with comment(s).I think the point is moot in this case, because the test_bit either sees the flag or it doesn't. It's not possible for the flag to be set AND should_end_transaction return false that would be gross violation of program correctness.So that's for the flushing part, but what about cur_trans->state?
Looking at the code, the barrier was there to order the publishing of the delayed_ref.flushing (now replaced by the bit flag) against surrounding code. So independently of this patch, let's reason about trans state. In should_end_transaction it's read without holding any locks. (U) It's modified in btrfs_cleanup_transaction without holding the fs_info->trans_lock (U), but the STATE_ERROR flag is going to be set. set in cleanup_transaction under fs_info->trans_lock (L) set in btrfs_commit_trans to COMMIT_START under fs_info->trans_lock.(L) set in btrfs_commit_trans to COMMIT_DOING under fs_info->trans_lock.(L) set in btrfs_commit_trans to COMMIT_UNBLOCK under fs_info->trans_lock.(L) set in btrfs_commit_trans to COMMIT_COMPLETED without locks but at this point the transaction is finished and fs_info->running_trans is NULL (U but irrelevant). So by the looks of it we can have a concurrent READ race with a Write, due to reads not taking a lock. In this case what we want to ensure is we either see new or old state. I consulted with Will Deacon and he said that in such a case we'd want to annotate the accesses to ->state with (READ|WRITE)_ONCE so as to avoid a theoretical tear, in this case I don't think this could happen but I imagine at some point kcsan would flag such an access as racy (which it is).